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This work describes Cosmic Microwave Background (CMB) data analysis algorithms and 
their implementations, developed to produce a pixelized map of the sky and a corresponding 
pixel-pixel noise correlation matrix from time ordered data for a CMB mapping experiment. 
We discuss in turn algorithms for estimating noise properties from the time ordered data, 
techniques for manipulating the time ordered data, and a number of variants of the maximum 
likelihood map-making procedure. We pay particular attention to issues pertinent to real CMB 
data, and present ways of incorporating them within the framework of maximum likelihood 
map-making. Making a map of the sky is shown to be not only an intermediate step rendering 
an image of the sky, but also an important diagnostic stage, when tests for and/or removal of 
systematic effects can efficiently be performed. The case under study is the maxima-i data set. 
However, the methods discussed are expected to be applicable to the analysis of other current 
and forthcoming CMB experiments. 

PACS numbers: 98.70.Vc, 98.80.Bp, 98.80.Es 



I. INTRODUCTION 

This paper presents a comprehensive set of data analy- 
sis methods aiming at the production of a map of the sky 
and an accurate estimate of map uncertainty in a case 
of CMB mapping experiments. We describe a variety 
of maximum-likelihood-based map-making methods, and 
discuss their performance in the analysis of the maxima-i 
data set. 

MAXIMA is a balloon-borne experiment built pri- 
marily in Berkeley |^ and designed to make a number 
of short-duration flights. To date the maxima team 
has published the results of the first flight of the instru- 
ment m ^ , consisting of a high-resolution map of almost 
100 square degrees of the microwave sky, together with a 
power spectrum of the CMB anisotropics observed in the 
map covering a broad range in £ space from ^ ~ 35 up to 
~ 1235, corresponding to angular scales from 5 degrees 
down to 5 arcminutes. Such products are final results of 
an involved data analysis pipeline described in this pa- 
per. The complexity and size of this data set have proven 
to be a significant challenge for data analysis methods 
setting demanding requirements for both their precision 



and speed. The challenge which our methods and tools 
are designed to meet. With other complex and advanced 
CMB experiments in progress and anticipated (includ- 
ing the satellite missions, MAP Q and Planck Q), these 
tools and methods can be expected to be of wider interest 
and applicability. Describing the details of the maxima-i 
data analysis is another goal of this paper. 

The structure of this work is as follows; in Sect. |1 we 
deal with the data in the time domain, focusing on data 
pre-processing and noise estimation, including an outline 
of the basic features of the maxima-i data set, and of the 
simulation tools used to test our map-making pipeline. 
Sect. Ill is devoted to the description and comparison 
of a suite of different map-making methods. Those si- 
multaneously produce both a map and a corresponding 
pixel-pixel noise correlation matrix. Although the al- 
gorithms are all based on the maximum likelihood ap- 
proach, they differ in the way they attempt to optimize 
the balance between accuracy and speed. We demon- 
strate their performance in analyzing MAXiMA-like sim- 
ulations as well as the actual MAXIMA-I data set. In 
Sect. IV we discuss ways of handling systematic effects 
within the general framework of maximum likelihood 
map-making. Although such systematics are inevitable 
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Section IIa,b 
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TABLE I: A summary of the notation used in this paper. 
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Noise stream 
estimation: 
Section lie, V 



Time ordered 
data stream: 
Section IIa,b 



Noise power 
spectrum estimation: 
Section IIc,d 



Processed time 
stream data: 
Section lie 



Noise correlation 
matrix in pixel 
domain: 
Section III, IV 



Map: 
Section III, IV 



Consistency tests of map— malcing products: 
Section VI. 



FIG. 1: A flow-chart showing the layout and inter- 
dependencies of the sections of this paper, as well as mutual 
relations of different stages of the data analysis pipeline as 
described in the text. 



in real CMB data sets, they are rarely considered in 
more theoretical accounts of CMB data analysis (e.g., 
@, |, |, |ll|, |l|, |l|). In Sect. V we combine 
these elements, and consider some practical aspects of 
recently-proposed iterative algorithms for time-domain 
noise estimation [^l], In Sect. VI, we complete our 
presentation with a description of the numerical tests we 
have developed to check consistency of our analysis. 

The inter-dependencies of different sections of this pa- 
per are depicted in Fig. |l|. 

In this paper we do not consider issues related to 
the subsequent statistical investigation of these maps, 
such as tests for Gaussianity or power spectrum es- 
timation. Our map-making methods are intended to 
be as general as possible, and because they provide 
both a map and a pixel-domain noise correlation ma- 
trix, they do not restrict the subsequent choice of statis- 
tical tool. Methods for obtaining an angular power spec- 
trum from, or for searching for non-Gaussianity in, such 
maps have been described in a number of recent papers 
(e.g., 0, [11, ^, 1^, H, H and ||, H 



re- 



spectively). The power spectra shown in this paper have 
been computed using a generic uncustomized version of 
a quadratic estimator as implemented in the publicly- 
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AC low-pass electronic filter 
AC high-pass electronic filter 
bolometer low-pass filter 
sky signal 
a-synchronous signal 
time stream noise 
time domain noise power spectrum 
prewhitened time domain noise spectrum 

prewhitening filter 
noise correlation length in time domain 
spectrum smoothing window function 
time domain noise correlation matrix 
circulant part of Nt 
sparse part of Nt 
pointing matrix of the experiment 
pixelized sky = map 
noise correlation matrix in pixel domain 
circulant part of Np 
sparse part of Np 
number of pixels in a map rUp 
length of the time stream segment 
pointing matrices of synchronous effects 
extra fictitious pixels 
generalized pointing matrix 
generalized map 
generalized noise correlation matrix 
time domain template 
Kronecker delta 
vector of ones 
singular pixel domain eigen-vector 
CMB anisotropy power spectrum 
total signal-f noise correlation matrix 
CMB signal correlation matrix 
Cholesky factor of matrix Np 
decorrelated map 
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available MADCAP package ||, |9[. 

A comprehensive discussion of the maps and power 
spectra produced by the maxima-i experiment can be 
found elsewhere l3[_ % |3C|] ; th eir cosmological implications 
are discussed in p£ , |3l|, 32 1. 

Hereafter, we denote vectors and scalars with bold and 
non-bold lower case letters (either Roman or Greek) re- 
spectively, while matrices and operators are denoted with 
either bold or caligraphic upper case letters. Vector and 
matrix components are indexed in parentheses, rather 
than by subscript; subscripts and superscripts are used to 
distinguish between different variants of a given quantity, 
e.g., p and t denote a pixel and a time domain quantity 
respectively. A tilde over a quantity denotes its Fourier 
transform, i.e., 

g{f)= I dtgit)expi2nLft). 
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Occasional failures of these good intentions are also ac- 
knowledged. A summary of the most frequently used 
symbols is given in Table ^ 



MAXIMA-I 
single detector data 



II. TIME ORDERED DATA 

A. The MAXIMA-I data set 

The MAXIMA-I data and instrument have been de- 
scribed in H, ^. The maxima-i data set consists of 
approximately 2,300,000 measurements for each of 16 
photometers and 4 dark channels used to monitor the 
experiment. To date only data from six of the detec- 
tors have been analyzed. These include four photome- 
ters sensitive to CMB photons (3 with frequency band- 
widths centered on 150 GHz and 1 on 240 GHz), one 
photometer monitoring atmosphere and foregrounds at 
410 GHz, and one 'dark' bolometer (screened from in- 
cident photons) used to search for systematic problems. 
Each of the data streams for each of these detectors is 
divided into two parts, hereafter called the CMBl and 
CMB2 scans respectively (see Fig. |2|) . These scans were 
taken at different elevations and were separated in time 
by ~ 1.5 hours. Projected on the sky, they largely overlap 
one another, creating a well-crosslinked map. Each of the 
two scans is further subdivided into 11 (CMBl scan) and 
10 (CMB2 scan) data subsets whose lengths range from 
30, 000 to 250, 000 time samples. These disjoint stretches 
of contiguous data - hereafter referred to as time stream 
segments - are defined by the requirement that the noise 
within a segment be approximately stationary. The noise 
correlations between the segments are guaranteed to be 
negligible by discarding a sufficiently long section of the 
time stream data between neighboring segments. In ad- 
dition, each of the segments has an overall offset and 
gradient subtracted independently. 

Not all measurements within a segment are to be in- 
cluded in the further analysis. Measurements compro- 
mised by glitches - cosmic rays hits, telemetry drop-outs, 
or other short transients - are flagged as 'bad' and con- 
stitute gaps in a segment. We usually determine about 
2 — 3% of the time samples as gaps. 

The signal and noise are subjected to several filters be- 
fore being recorded, including a low-pass filter due to the 
detector time constant and subsequent AC low- and high- 
pass filters in the readout electronics. These filters are 
phase-shifting (i.e., their Fourier space representation is 
by complex numbers) and they define the temporal fre- 
quency response of the instrument to a band between 
/ ^ O.OlHz and / ^ 20Hz. This frequency response to- 
gether with the scan strategy give maxima-i sensitivity 
to the sky features in the range of angular scales from 
^ 5 degrees down to ^ 5 arcminutcs. The instrumental 
filters must be deconvolved from the time ordered data 
in the course of the data analysis. The functional form of 
the electronic filters is accurately measured in the labo- 
ratory. The detector time constant is measured using the 




3 4 



9 10 



11 



7 



i 


i 


i 


3 


4 


5 


8 


9 


10 



Time stream segments 



FIG. 2: A structure of the maxima-i data set. 



in-flight response of the detectors to a known source - in 
the case of maxima-i, Jupiter. For our purposes here it is 
assumed that the instrumental filters are known to a neg- 
ligible uncertainty in the range of frequencies of interest, 
i.e., ;S 50Hz. (Note that the bolometer time constant has 
an uncertainty of ^ 0.5 — 1.0 ms. This error is included 
in the analysis of the maxima-i data ^, |3^, but we 
neglect it for the purpose of this paper.) The data sam- 
pling interval is fixed throughout the entire observation 
at A = 0.0048 s. Hereafter, we identify an observation 
in the time domain (including gaps) by a global sample- 
number index. 

The maxima scanning strategy includes an azimuthal 
primary mirror chop with a frequency ~ 0.45 Hz su- 
perimposed on a slow azimuthal balloon gondola oscil- 
lation with a characteristic frequency of ^ 0.01 Hz. Both 
the position of the primary mirror with respect to the 
gondola and the azimuthal position of the gondola were 
recorded during flight, enabling the tracking of primary 
mirror and/or gondola synchronous systematic effects. 

The MAXIMA pointing is reconstructed using the obser- 
vations of guide stars made throughout the flight. With 
an rms accuracy better than ~ 1', pointing uncertainty 
is neglected during the map-making stage, although it 
can, and indeed should, be included as a systematic un- 
certainty of the final power spectrum (e.g., [Q). 

The time stream data are assigned to pixels prior to 
map-making. The relatively small angular size of the 
sky patch observed by maxima-i allows us to use sim- 
ple square pixels on a grid whose rows are at constant 
declination. Due to computational constraints, most of 
the tests discussed in this paper have been performed us- 
ing 8 arcminute pixels, although the highest resolution 
MAXIMA-I results have been computed using 3 arcminute 
pixels . 

The independence of a pixel's size on its sky position 
enables us to account for the extra smoothing due to the 
pixelization by u sing a simple, albeit approximate, pixel 
window function |33[ . This approximation usually breaks 
down at the smallest angular scales, due to the lack of 
uniformity of the sky sampling on these scales. Ways of 
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alleviating this problem are discussed elsewhere ^ . The 
methods discussed below are independent of the assumed 
pixelization scheme. 



B. Time stream modeling and simulations 

Drawing from the maxima-i experience, we now ex- 
plicitly list all the features of the time stream data which 
we have found to be important in the data analysis. We 
also briefly explain how we incorporate these features into 
our simulations, which are then used for tests of our data 
analysis tools. 



1 Time stream model 

We denote the entire raw time stream from a single 
detector, including the effect of the instrumental filters, 
as dp. As noted above, this is subdivided into a number 
(riseg) of disjoint segments, so that we can write dp = 
[f-ld^^. 

Every measurement contains contributions from both 
the sky and the instrument, and is modeled as, 

(0 = ^ {hi) [tsky (7 {])) +x{cx ij))] + nt (z) . 

3 

(1) 

F denotes the effect of all of the instrumental filters, 
and for MAXIMA is therefore a convolution of three fil- 
ters ~ F = Fhigh * Fiov^ -k Fboio - corresponding to the 
AC high-pass, AC low-pass, and bolometer low-pass fil- 
ters, tsky i'J (i)) is the temperature of the sky in the 
direction f (i) observed at time i. x is any extra sys- 
tematic 'a-synclironous' effect (e.g., primary mirror chop 
synchronous noise) which only depends on a known pa- 
rameter a (e.g., the mirror position). The dependence 
of X (a) on time is, therefore, exclusively a result of the 
time-dependence of the parameter a., rit (i) denotes the 
total (Gaussian, correlated) instrument noise in observa- 
tion i. In fact, independent noise components are intro- 
duced into the time stream at four different stages in the 
instrument and, more precisely, the total noise is repre- 
sented as 

nt = Fhigh [Fio-w [Fboiotiti + nt2] + nta] + «t4- (2) 



Here nti , nt2 , nta and n,t4 denote the independent noise 
components added to the signal prior to the bolometer 
low-pass, AC high-pass, AC low-pass filtering, and signal 
digitization, respectively, nti and nt2 components to- 
gether are expected to dominate the total instrumental 
noise with only riti component displaying a 1// behav- 
ior at low frequencies attributable to temperature fluc- 
tuations of the detector. Though all such features are of 
importance for proper forecasting and simulations of a 
performance of an experiment, none of these needs to be 
assumed in the noise estimation described in Sect. 11 C, 



which directly estimates the total noise, nt- The instru- 
mental noise is assumed to be Gaussian and stationary 
within each segment, and is described by a (segment- 
dependent) noise power spectrum, P{f). Each segment 
is assumed to consist of measurements evenly spaced in 
time. However, the data constituting gaps are not to be 
included in a final map. 



2 Simulations 

In order to test our data analysis pipeline we want to 
be able to simulate the time stream of a MAXIMA-I like 
experiment. The simulation is designed to incorporate 
all the important features of the actual data set listed 
above, including the gap and segment structure, scanning 
strategy, and an approximate (symmetric) beam 

The simulated time stream is described by Eq. (|^). 
The sky signal (tsky ) is generated by applying the known 
pointing solution to a simulated CMB sky, generated as 
a high-resolution Gaussian realization given some fidu- 
cial cosmological parameters. We also include a primary 
mirror chop synchronous systematic effect, x (a), in our 
simulations, varying both its functional dependence on 
the primary mirror position and its amplitude. The in- 
strumental filters are then applied to the simulated time 
stream as described by Eq. (|l|). Their functional form 
follows that of actual maxima filters. 

Finally we add Gaussian correlated noise to each time 
sample. This is modeled according to Eq. (^ assuming 
that each component, nti, has a power spectrum in a 
form given by. 



(/) 



a" I 1 

^ Sims * 



fkn 



f 



(3) 



This approach means that our simulations mimic the 
range of floating point operations required in analyzing 
the real data, giving us some insight into possible numer- 
ical error accumulation in our data analysis pipeline. 



C. Time stream processing and noise estimation 

We now describe the procedure for simultaneously 
estimating the time-domain noise power spectrum and 
restoring the stationarity of time stream noise by re- 
filling the flagged-data gaps in the time stream. 

Typically, map-making methodologies assume that the 
time-domain noise power spectrum is precisely known, 
even though in practice it has to be estimated from the 
time stream data itself. The error in this estimation is 
therefore not included in general in the end-to-end data 
analysis error budget (although see [|ll| for a possible 
way of tackling this issue). This fact clearly highlights 
the need for a high level of precision at this stage. 

The time stream model described by Eq. (P is quite 
complex. Some of its features are of less importance for 
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FIG. 3: A fl ow-ch art of the noise estimation procedure out- 
lined in Sect. [IC. Objects inscribed in ovals are either input 
or output products of the procedure. This procedure can be 
incorporated as anart of the iterative noise estimation as de- 
scribed in Sect. M. In this case, its products (noise power 
spectrum and a processed time stream) are used to generate 
a sky map, which is one of input objects as shown in this 
chart. 



this Section so, for simplicity, we start by assuming that 
the time stream data contains only Gaussian noise (i.e., 
no sky or systematic signals, tgky, x <C fit) and focus on 
resolving problems in the noise estimation arising from 
the presence of gaps in the time stream and of noise cor- 
relations on time scales comparable to the length of an 
entire data segment. 

The presence of discontinuities (gaps) in the time 
stream poses a two-fold problem. It is an obstacle both 
to estimating the time stream noise spectrum and to de- 
vising a fast and efficient map making algorithm (see 
Sect. [II). It is therefore desirable to restore time stream 
continuity without compromising the stationarity of the 
noise or sacrificing too much of the data. These two 
goals, estimating the noise power spectrum and restor- 
ing the continuity of the time stream, are achieved by a 
procedure described below (see Fig. ^ for its synopsis). 



1 A pure noise time stream 

The initial input at this stage consists of the time 
stream data with the instrtmrental filters convolved, dp. 



The filters can be deconvolved in the frequency domain, 
d{t)^ f df dp if) (/) exp {~2nLft) . (4) 



This deconvolution could be done immediately, prior to 
any further time stream manipulation, but this is not rec- 
ommended due to the high-frequency noise it adds to the 
time stream (see Figs. ^ & ||). Although this noise would 
not be a problem for our formalism in theory, it can be 
a source of significant numerical error in any practical 
implementation, possibly biasing the final results. This 
numerical error is usually incurred while Fourier trans- 
forming the noise power spectrum. Instrumental filter 
deconvolution causes the noise power at the highest fre- 
quencies to dominate the total power of the time stream 
noise by many orders of magnitude, and consequently a 
small error in the total noise power estimate translates 
into a substantial error in the noise power estimate in the 
lower range of frequencies of most interest. The remedy is 
either to leave the instrumental filter deconvolution until 
after the noise estimation stage, or to perform the decon- 
volution at the beginning but concurrently convolving an 
extra (real) filter, Xreif)-, to suppress the high frequency 
noise, i.e., 



d{t) 



df dp if) Xre if) F-^ if) exp (-27rt/t) . (5) 



In this conventional choice is to apply a real fil- 

ter given in the frequency domain as the amplitude of 



Fif) 



In 



the total instrumental filter, i.e., Xre (/) 
some map-making algorithms this filter can be decon- 
volved from the time stream data once the noise power 
has been estimated and the noise stationarity restored 
so that effectively no filtering has been applied to the 
time stream data due the course of data analysis. This 
is the case when the noise correlation matrix in the time 
domain does not have to be computed explicitly, as in, 
e.g., the approximate map-making algorithms discussed 
below. In these cases the precise shape of the extra filter 
is not important. If, however, a full noise correlation ma- 
trix in time domain is needed, as in exact map-making 
algorithms, such a deconvolution should be avoided. In 
this case the precise form of the extra filter is very im- 
portant, and should be designed to suppress the high fre- 
quency noise power sufficiently, while leaving the widest 
possible range of lower frequencies unaffected. 

In the following we set the initial width of all the gaps 
to be no less than twice the effective width of the in- 
strumental filters (which we conservatively choose to be 
100 time samples, see Fig. refiecting our ignorance 
about the origin of some of the gaps. Similarly, we al- 
ways widen gaps on both sides by an effective filter width 
when (de)convolving any filter from the raw time stream. 

The noise estimation algorithm described here is ap- 
plied to each time stream segment separately and consists 
of the following steps: 




FIG. 4: Filters used in the analysis of maxima-i data. Left panel shows a frequency domain representation of the filters. Thin 
solid and dashed lines show the (absolute values of) real and imaginary parts of the total instrumental filter, F (/), and the 
thick line its complex amplitude, \F (/) j. The dot-dashed line depicts the real prewhitening filter, W (/), as defined in Eq. (^) 
(2/3 = 1.0). The time domain representation of these filters shown in middle panel: solid line - total instrumental filter and 
dashed line its amplitude; and right panel - the prewhitening filter. The effective time domain widths of these filters assumed 
in the analysis are 100 time samples for both instrumental filters and 50 time samples for the prewhitening filter. 



1. Reducing the noise correlation length, Ac-' 

The time stream segment is convolved with a 
prewhitening filter, 

dpw (/) = W if) d~F (/) , (6) 

and simultaneously the gaps are widened to ac- 
count for the width of this filter. The prewhitening 
filter (W) is usually a half-differencing real filter 

W{f) = sin^{cf>/2), (7) 

with a phase = tt/A and a sampling interval A. 
The prewhitening index, reflects the expected 
low frequency behavior of a noise power spectrum, 
i.e., P(/) ~ /-^'^ for / ^ 0. It is adjusted 
for each segment separately (Fig. ||), so that the 
prewhitened power spectrum, Pw (/) ^ CONST for 
/ — > 0. The functional form of the prewhitening 
filter is tested in the subsequent stages of the data 
analysis, and, if necessary, it may be refined and 
the entire procedure repeated. 

2. Estimating the noise power spectrum of the 
prewhitened time stream, Pfw (/)• 

Sections of the time stream segment are located 
that have no gaps, are longer than the expected 
noise correlation length, and are separated from 
each other by at least this distance. These are used 
to compute a series of statistically independent es- 
timates of the noise power spectra using standard 
FFT-based methods (e.g., [^). These estimates 
are then checked for stationarity and, if consis- 
tent with one another, averaged. If they do not 



exhibit stationarity then the original time stream 
segments have to be redefined and made appropri- 
ately shorter. The final - average - power spectrum 
of the stationary noise in a segment can be addi- 
tionally smoothed and extra(inter)polated to the 
frequencies corresponding to the full segment. 

3. Restoring time stream continuity: 

The gaps are filled with a constrained realization of 
the noise which is assumed to be Gaussian with 
a power spectrum, Ppw if), as determined on the 
previous step and corresponding noise correlation 
matrix (see Eq. ^0|), denoted as Nt below. The 
noise within each gap is constrained only by good 
data samples in its vicinity, i.e., those which are 
within the noise correlation length, Ac, of the gap 
edges, 

dpw («) = ^ («) + (8) 
-f ^ TVt {i, m) AT;"^ (to, n) {dpw (m) - | (to)) . 

Here, N'^ is a 2Ac x 2Ac matrix describing noise cor- 
relations between the time samples just outside of a 
given gap, ^ is a vector of Gaussian, correlated (but 
unconstrained) random variables, and i indexes the 
time samples within a gap. The sum is over to and 
n, both of which index the time samples outside of 
a given gap which are being used to constrain the 
noise realization within the gap. Due to required 
inversion of the matrix the computational cost 
of the procedure is O (A^) per gap, and it is there- 
fore crucial that Ac is sufficiently short. 
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FIG. 5: Estimated noise power spectra for simulated and real data. In left panel, the dashed line shows an effective noise power 
spectrum with 2/9 = 1.0 used for a simulation (its amplitude is scaled down by a factor of 5 to avoid overlapping with solid line). 
Solid and dotted lines show noise power spectra recovered from this simulation using a prewhitening filter as given by Eq. (M) 
with 2/9 = 1.0 - solid line; 2/9 = 1.25 (top) and 2/3 — 0.75 (bottom) dotted lines. Middle panel shows noise power spectra for 
a single segment of the actual maxima-i data. Again the recovery has been made using various prewhitening filters as in the 
left panel. In all the results displayed in these panels aggressive smoothing have been applied to facilitate better comparison. 
For actual calculations, such an approach does not have to be the best option. Right panel shows a noise spectrum obtained 
just as a result of averaging at the second step of the noise estimation procedure, prior to any smoothing. Two small spikes 
at ~ 0.45IIz and 0.9Hz correspond to a primary mirror chop frequency and its first harmonic. The overplotted dash line is a 
smoothed version of this spectrum as shown in the middle panel with a solid line. Shaded areas in both left and middle panels 
show a region of frequencies we marginalize over to avoid either spurious high frequency noise or a noise estimation uncertainty 
at the low frequency end. 



4. Re-estimating the noise power spectrum: 

The entire time stream segment, with gaps filled, is 
now used to re-estimate the noise power spectrum. 
At this stage the low frequency behavior of the 
noise can be tested; if it is found to be significantly 
non-flat (i.e., non- white), or to differ from the spec- 
trum estimated at stage 2, then the prewhitening 
filters used at the stage 1 must be adjusted and the 
entire sequence repeated. 

5. Deconvolving the filters: 

Once the noise power spectrum has been estimated, 
and the gaps filled with the constrained noise real- 
ization, the instrumental and prewhitening filters 
can be deconvolved from both the time stream and 
the noise power spectrum. In general, this ap- 
plies to both the instrumental (or \Xre (/) |) and 
prewhitening filters. As mentioned above, the de- 
convolution of the instrumental filters is not always 
advisable since it may introduce numerical errors. 
The deconvolution of the prewhitening filter is also 
sometimes postponed, and accounted for only in 
the algebra of the map-making algorithm . Such 
an approach attempts to make use of the shortened 
correlation length of the prewhitened noise to cut 
the number of fioating point operations required 
to make the map. In practice we find that it is 
difficult to take advantage of this in any realistic 



map-making algorithm (see Sect. HI), while, as we 
argue below, performing the deconvolution at the 
outset of the noise estimation procedure helps to 
alleviate a number of problems. 

This procedure attempts to estimate the ensemble av- 
erage noise power spectrum using just one realization of 
the time stream segment. This is clearly not enough, 
especially in the presence of correlations on time scales 
comparable with the length of the segment. Prewhitening 
helps to alleviate this problem, yet it requires some extra 
knowledge about the functional form of the prewhiten- 
ing filter. In this approach, an educated guess, followed 
by iterative refinement, is then tested a posteriori. For 
a prewhitening filter of the form given in Eq. (^, we 
usually find that the power-law index f3 can be uniquely 
determined with an error not bigger than \A(3\ 0.1 by 
comparing the power spectra computed at the 2nd and 
4th stages of the procedure (see Fig. |^). 

Computing and averaging the noise power spectra of 
the independent, gap- free, sections of a segment (step 2 
above) helps to recover a better ensemble average spec- 
trum at intermediate and high frequencies. Smoothing 
in the frequency domain can also be applied to get even 
closer to the ensemble average. However, in both cases we 
need to assume that the real noise power spectrum does 
not display any significant variation on scales smaller 
than the smoothing or frequency re-sampling scale. 
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The result of this procedure is an estimate of the en- 
semble average noise power spectrum in the time do- 
main which is reliable at sufficiently high frequencies (for 
MAXIMA-I for / XL O.lHz), but sample-error dominated at 
the low frequency end (/ ^ O.lHz). To minimize the ef- 
fect of the low frequency uncertainty, in the subsequent 
analysis we marginalize over the low frequency part of 
the spectrum (see Sect. IVB ). 

The entire procedure restores the continuity of the time 
stream segments: the noise is stationary over an entire 
segment, including gaps, and the sky signal in gaps is 
zero. This is important in the subsequent stages of the 
data analysis; we might worry that adding a random (al- 
beit constrained) signal to the data introduces an extra 
degree freedom and undermines the uniqness of the re- 
sult. Although this is true, while different realizations of 
the random component may indeed lead to slightly differ- 
ent results, all of these results are necessarily statistically 
equivalent, with no bias introduced. Moreover, the im- 
pact of the extra randomness is significantly reduced by 
the g ap pixel marginalization described in Sects. IV A & 
[VB. Consequently, we find that, for maxima-i the final 



maps and their power spectra are robust and not affected 
by the random element of this procedure. 

Generating constrained noise realizations to fill the 
gaps is the most time consuming part of this procedure. 
How important this is for the consistency and accuracy 
of the entire data analysis pipeline depends on the fre- 
quency, size, and regularity of the gaps in a time stream 
segment. In the case of a handful of narrow, randomly 
scattered, gaps, an unconstrained, uncorrelated, random 
noise realization with an rms determined from the rest of 
the time stream may serve as a convenient first approxi- 
mation for the analysis. 



2 A time stream with non-negligible sky signal 

If the sky signal present in the time stream data is not 
negligible, but still sub-dominant, then the noise estima- 
tion has to be performed iteratively, following an algo- 
rithm proposed by Ferreira & Jaffe In this case we 
distinguish between the full (noise + signal) time stream 
and the noise-only time stream, where the latter is the 
noise in the time stream data. At each step of the iter- 
ation, a maximum-likelihood map estimated on the pre- 
vious step is subtracted from the time stream giving the 
current best estimate of the noise stream (see Sect. 0). 
The above noise power spectrum procedure is applied 
in full only to the noise time stream; only the instru- 
mental filters deconvolution and related gap-widening 
are performed on the actual time stream (see Fig. ||). 
Once the noise estimation has been accomplished, only 
the noise stream continuity is restored. Therefore the 
gap time samples from the noise time stream data are 
used to replace their analogues in the full (unprocessed) 
time stream. This avoids unnecessarily wasting or bias- 
ing good data samples which happen to be in a vicinity 



of a gap. 

A simple extension of this iterative approach also al- 
lows us to account for problems related to presence of 
synchronous systematic signals in the data. We discuss 
the appropriate algorithms and related issues in some de- 
tail in Sect. 0. 



D. The time domain noise correlation matrix 

Formally, maximum likelihood map-making requires 
the full time-time noise correlation matrix, rather than 
just a noise power spectrum. For an idealized time stream 
the former is just the Fourier transform of the latter. 



Nct{i,j) = y d/P(/)exp(-27rt/(z-j)A). (9) 

For a finite time stream this expression leads to a cir- 
culant matrix denoted by the subscript C [Q. If the 
correlation length in the time stream is less than half of 
the time stream length, then an alternative approximate 
correlation matrix, Nt, is given by 



Nct{i,i); if < Ac, 

0; otherwise. 



(10) 



In practise, numerical error in the estimation (and 
Fourier transform) of the noise power spectrum means 
that Nf computed in this way may not even be posi- 
tive definite. Such problems can be alleviated by the 
introduction of an extra power spectrum smoothing win- 
dow, S{f), designed to truncate smoothly the correla- 
tions once their amplitude reaches the limits of numerical 
accuracy. Again, it is prudent to apply the window func- 
tion prior to deconvolving the prewhitening filter. Eq. (|^) 
then assumes a somewhat more complicated form, 

Nct{t,j) = J df\W{f)\-^e^p{-27rif{t^j)A) 

X [drS{\f~f\)Pw{n, (11) 



where W (f) is given by Eq. (^. In the following, we 
make use of this expression whenever computing Net- 
Its inverse is approximated by an analogous formula [ p6[ , 

Nct-'ihj) = f df\W{f)\'exp{-27Tif{i-j)A) 



X df'Si\f-f\)P^'in. (12) 



Our usual choice for S{f) is a Gaussian with an appropri- 
ately tuned width (typically of 1000 — 5000 time samples 

for MAXIMA-l). 



III. MAP-MAKING: FORMALISM 
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A. The basic framework 

The procedure described above provides the input for 
the map-making per se; here, we take as given the time 
stream data with instrumental filters deconvolved and 
gaps filled, and the corresponding noise power spectrum 
for each segment. 

In this Section we consider a case of a somewhat ideal- 
ized time stream, leaving a discussion of the fully realistic 
case to Sect. IV. We consider a time stream consisting of 
a single segment, neglect most of the systematic contri- 
butions, but do admit the presence of (now filled) gaps 
in the data. The simplified Eq. (^ reads now, 



df = Arrir 



nt. 



(13) 



Here A is a pointing matrix assigning each time sample 
to an appropriate pixel (or a set of pixels in a case of 
differencing experiments) on the sky (as observed at the 
given time) with the sky signal given by a pixelized map: 
rUp. Tit is the (Gaussian) noise time stream with corre- 
lations given by the power spectrum estimated as in the 
previous Section. 

In this case we can write a closed form solution for 
the map 0, § (but see [42]). The maximum likelihood 
estimates of the map, trip, and the pixel-pixel noise cor- 
relation matrix, Np, then read. 



rrir 



(14) 



Np = (A^MA) ^ (A^MNtMA) (a^MA) (15) 



M = Nt 

ance estimates of nip and Np 



Here, M is a positive-definite symmetric matrix, and Nt 
the time domain noise correlation matrix (Eq. [lO[ ). If 
^, then Eqs. ( |l^ ) & (15) give minimum vari- 
Other choices trade a 
larger variance for increased computational speed. In 
particular Tegmark Q proposed the choice for M of the 
circulant part of the Nt matrix. We discuss this option 
in Sect. IIIC below. 



In the following we present some different approaches 
and their application to a realistic maxim A-like data set. 



B. Minimum variance approaches 

1 Exact methods 

With M = Nt^^ , we get the minimum variance esti- 
mates, 

rUp - {A^Nt-^AY^ A^Nt-^dt, (16) 
Np = {A^Nt-^Ay\ (17) 

The exact implementation of these equations seems a 
daunting task. The time stream length may easily reach 
many tens and hundreds of million of samples, making ex- 
act inversion of the time domain noise correlation matrix 



prohibitive. However, for a MAXIMA-I like experiment 
with segments lengths reaching only O (lO^) samples an 
exact implementation is feasible on a moderately power- 
ful workstation. 

At the core of the implementation lies the observation 
that for a gap-free time stream of the length rig with sta- 
tionary noise the time domain noise correlation matrix 
is Toeplitz, with Nt{i,j) = Nt{\i - j\ ,0). The inver- 
sion of a Toeplitz matrix can be performed in as few 
as O (n^) operations - clearly feasible for O (lO^) time 
samples - and even faster algorithms are possible (e.g., 
p6| ) bringing that number down to O (ris log^ Ug). The 
number of operations can be further reduced if the noise 
correlation length is shorter than the time stream seg- 
ment length and the noise correlation matrix is therefore 
band-diagonal. 

If gaps are present in the time stream then, in princi- 
ple, the Toeplitz (stationary) character of the time do- 
main noise correlation matrix is lost. However the gap- 
filling procedure described above is explicitly designed to 
restore the stationarity of the noise in the time stream. 
Since the gap samples contain no sky signal (by construc- 
tion) they can be treated as data taken while observing 
a fictitious signal- free pixel in the sky map trip. Once 
the map and its noise matrix are estimated, this extra 
gap pixel is rejected from the map and the pixel domain 
noise correlation matrix (effectively marginalizing over 
it). This is a special application of t he ext ra pixel method 
discussed in detail below (see Sect. IV A and also jSTj). 

The computational effort then scales with a number of 
pixels ripix and a number of time samples rig as: 

• noise inverse in time domain: 



Nt 



Nt 



• noise inverse in pixel domain: 
Nt~\A — > A'^Nt^^A : O { 



• noise weighted map: 

A'^Nt-^dt : O inl) 



Nt-\A,dt 



noise matrix in pixel domain: 
N-^ 



Np : O «J 



• final map: 
Np,A^Nt-'dt Np (A^Nt-'dt 



For the first three items substantial savings can be 
made if the inverse time-time noise correlation matrix, 
Nt^^, is assumed to be band-diagonal - an approxi- 
mation usually well fulfilled for the inverse of a band- 
diagonal noise correlation matrix. If this is the case, 
and, in addition, the time domain correlation length is 
short, then the inverse pixel-pixel noise correlation ma- 
trix, 7Vp~^, is often rather sparse. In this case additional 
savings can be made by using the sparse matrix algo- 
rithms, e.g., Sherman-Morrison- Woodbury formula (e.g., 
p^ ) to calculate both the map and the noise correlation 
matrix in the pixel domain. 
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FIG. 6: Inverse time-domain noise correlation matrices computed using MADCAP (middle panel) and circulant (right) approx- 
imations and compared with the exact result (left). For visualization purposes the time-stream segment length assumed here 
is only 5, 000 samples and the correlation length, Ac — 1000. 



The memory requirements are also reduced by consid- 
ering the various matrices' structure. Since we only need 
to keep the first row of the Toeplitz Nt matrix and we 
do not need Nt^^ explicitly, the major memory require- 
ments are set by the size of the A^Nt^^ matrix, which 
is O (npixTis), and of the noise correlation for the map, 
O (?T-pi2,). Note that the first of these limits can be re- 
duced at the expense of the operation counts, since there 
is no need to save all UpixUg elements of this matrix pro- 
vided we are prepared to re-calculate them as needed. 



2 Approximate methods 

One way to speed up map-making, while at the same 
time providing a good approximation to the optimal 
minimum variance map was proposed by Ferreira and 
Jaffe ||ll[ and incorporated into the MADCAP package 
by Borrill ||2^. Rather than explicitly inverting the Nt 
matrix we instead use the approximation. 



Ncr'itJ), if <min(n,/2,A,), 
0, else. 

(18) 



Here Us and Ac are the time stream and correlation 
lengths respectively, and the subscript C denotes the 
circulant part of the noise correlation matrix (Eq. 11). 
Apart from this approximation, the remaining steps fol- 
low precisely as before. Hereafter we refer to this ap- 
proach as the MADCAP approximation. 

Inversion of a circulant matrix can be accomplished us- 
ing FFTs (Eq. ^ in only O (n^ logn^) operations. This 
approach is therefore designed to reap the benefits of Fast 
Fourier methods, while at the same time providing a good 
approximation to the exact solution. If Hs ^ 4Ac the 
fraction of Nt~^ matrix elements seriously misestimated 
(i.e., affected by the segment 'edge' effects, see Fig. ||) 



by this procedure should only be C(4A^/ng), and its 
performance in the large Ug limit should be satisfactory. 
However, because the inverse of a Toeplitz matrix is Teo- 
plitz only when it is also circulant this is the only case 
when this approach is exact. We demonstrate how well 
this approximation fares in reproducing the pixel domain 
map and noise correlation matrix in realistic cases below 
(Sect. pID| ). 

The operation count changes only for the two items in 
the list, which now read, 

• noise inverse in time domain: 
Nt^Nt-^ : 0{ns\ogns); 

• noise weighted map: 



,A,dt^ A^Nt'^dt : 0{nAogns 



The scaling of the noise weighted map operation count 
above assumes the use of FFTs for the Toeplitz matrix 
multiplication - a trick explained in p6| . The memory 
requirement is set either by the larger of the size of the 
noise matrix in the pixel domain n^^^ or the time stream 

length Hg. If an efficient, fast - O (ris logins) - imple- 
mentation of the Toeplitz matrix inversion is viable then, 
at least in the common case of an approximately band- 
diagonal inverse noise correlation matrix Nt~^, the ma- 
jor computational advantage of the approximate method 
over the exact one would vanish, and the memory re- 
quirement would remain as its only asset. 
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C. Circulant approaches 

If wc take M ~ Nct^^ then the noise correlation ma- 
trix in the pixel domain can be expressed as 

Np = Ncp + Nsp, 
Ncp = [A^Nct-'Ay\ (19) 

Nsp = Ncp^A^Nct^'NstNct'^A^Ncp, 

where the time domain noise correlation matrix has been 
decomposed into its circulant {Net) and sparse (Nst) 
parts, 

Nt = Net + Nst. (20) 

The sparse term compensates for the off-diagonal cor- 
ners of the circulant matrix. As in the case of the MAD- 
CAP approximation, the only inversion required in the 
time domain is that of a circulant matrix. Hence it has 
been argued that, because the Nst matrix should be very 
sparse, the operation count in this approach should be 
significantly lower than in the exact minimum variance 
approach [g. 

The disadvantage of this approach is not in the non- 
minimum- variance map that results (since the loss of pre- 
cision is usually insignificant) but rather in implement- 
ing its more complicated algebra. If no assumption is 
made about band-diagonality, then the computational 
costs scale as: 

• noise inverse in time domain: 
Net Nct~^ : O (us \ogns); 

• noise inverse in pixel domain: 
circulant part: 

Nct~\A-^ A^Nct'^A: O (nl); 

sparse part: 

Nst,Nct-\A^ 

A^Nct-'NstNct-'A : O (n^.^nl); 

• noise weighted map: 

Nct'\A,dt^ A^Nct'^dt : 0(n,logn,); 

• noise matrix in pixel domain: 
Np-' ^Np: 0{nl,,). 

Note that the operation count is dominated by the sparse 
inverse computation [rig > Upix). The operation count 
can be reduced if Nt and Net are assumed to be band 
diagonal. However, even then, for experiments like max- 
ima the operation count remains O (n^) , usually with a 
large prefactor. Hence this approach is far from com- 
petitive with those discussed above, although it can be 
comparable for very short correlation length data. The 
memory required again scales as O {npixUg). 

A possible approximation, which we advocate here- 
after, is to neglect the sparse correction completely in 
the expression for the pixel-pixel noise correlation matrix. 



Clearly the operation count drops to O (n^) or O {rip^^) 
- whichever is larger - and is usually O {rip^^) especially 

if Nct~^ is assumed to be band diagonal. In this case 
the method's computational requirements are compara- 
ble with those of the MADCAP algorithm |||| . In the 
following we refer to this approach as the circulant ap- 
proximation (CAP). This approximation is clearly very 
similar to the MADCAP approach discussed above, al- 
though it is strictly exact for a diagonal noise correlation 
matrix as well as for an infinitely long time stream. 

D. Comparison and assessment 

Although the various map-making methods outlined 
above are algebraically similar, being derived from the 
same underlying equations, their implementations and 
generalized extensions to fully realistic time streams dif- 
fer considerably. 

The approximate methods are of particular interest, 
because they achieve the speed of Fast Fourier trans- 
forms. If that is attained without sacrificing the nec- 
essary precision is to be tested and may depend on the 
particular case at hand. The important parameters are 
the noise correlation length (Ac) and the time stream 
length (ris). If these numbers are comparable the differ- 
ences between the methods can be significant, but they 
should disappear whenever Ac ^ ris and 'edge' effects are 
unimportant. Given the presence of a so-called 1// noise 
component in the time stream, the assumption that Ac 
is independent of the time stream length may seem to be 
plainly wrong. Instead the correlations should be present 
on time scales comparable with the time stream length. 
However, a low frequency cutoff (e.g., AC high passing 
filter in the maxima experiment) usually limits the cor- 
relation length independent of the time stream length. 

Here we address and illustrate those issues using the 
MAXIMA-i experiment as an example. For this experi- 
ment the average length of the time stream segments is 
around 50, 000 samples, with some segments as short as 
35, 000 samples. The very gradual decay of the correla- 
tions with time makes it difficult to determine the cor- 
relation length precisely. However, we have found that 
the dependence of the resulting map on the value of Ac 
assumed vanishes before Ac reaches 10, 000 samples, and 
have therefore used Ac = 10, 000 in all computations. 
Hence, for maxima-i, Ac is less than but not negligible 
to Tig, and prior to taking advantage of the speed of the 
approximate methods we need to demonstrate their ap- 
plicability. 

Note that comparing the different methods is not en- 
tirely straightforward. Assumptions about the band- 
diagonality of either the time domain noise correlation 
matrix or its inverse, as required by the different meth- 
ods, are clearly not equivalent. The relation between the 
assumed bandwidths of these matrices is also not obvi- 
ous. It may seem that a fair comparison would be to 
take the bandwidths to be as large as possible in all 
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FIG. 7: An example of pixel domain noise correlations estimated for a single segment of maxima-i data set. In this figure, 
we show correlations of a single selected pixel with its neighbors. This pixel is marked in left panel with an arrow. The color 
scale shows the correlation level relative to the RMS value of the noise in the selected pixel. The scanning direction was from 
the left bottom corner to the top right, leaving a smudge of strong correlations in pixel domain. Pixels are numbered row by 
row from the left to the right and from the bottom to the top as shown in the left panel. Note a strongly distorted aspect ratio 
of the figure due to a scan elongation in a declination direction. These correlations as a function of a pixel number are also 
displayed in the middle and right panels. They show the relative correlations of the selected pixel with its closer (middle) and 
more distant neighbors (right panel). The chosen pixel has a number 210. Different lines correspond to different map-making 
methods used for the correlation estimations. Filled circles show results of the exact minimum variance, solid line - approximate 
circulant (CAP), dashed - approximate MADCAP and dot-dashed - exact circulant methods. 



methods. However, we then run into the problem of 
numerical inaccuracies unavoidably present in the cal- 
culation of the noise correlation matrix (or its inverse), 
especially at large time lags, which can result in a non- 
positive definite (unphysical) noise correlation matrix in 
pixel domain. Although power spectrum windowing (see 
Eqs. In], |l2|) can be used to alleviate numerical problems 
of this sort, this may also have different consequences for 
each of the methods. 

This point is particularly conspicuous when compar- 
isons are made with the exact circulant method. We find 
that this approach is very sensitive to both the choice 
of Ac and the windowing technique. For this reason we 
do not quote precise numbers for the comparison of this 
method with the others. By adjusting the free parame- 
ters of the method we can reproduce other methods' re- 
sults quite accurately, and within the expected statistical 
uncertainty of the maps. 

In the case of the other comparisons we have applied a 
Gaussian window function (Eqs. p^ ) while computing 
the noise power spectrum and kept similar correlation 
lengths for all three methods. In any case, the numbers 
quoted below should be looked at as indicative rather 
than as the best case possible. 

We choose to compare maps and noise correlation ma- 
trices directly in pixel domain, notwithstanding the fact 
that the approximations are really applied when the in- 
verse of the noise correlation matrix in the time domain 
is computed (see Fig. It is the quality of the estimate 



of the map and its noise correlation matrix which mat- 
ter for any subsequent analysis. Such an approach also 
seems to be more general than assessing the quality of a 
map through the application of a specific statistical tool. 

A sample of results is shown in Fig. |7[ The middle 
and right panels show two regimes of pixel-domain noise 
correlations as estimated using different methods. The 
left panel shows the correlations of a selected pixel with 
all the others as projected on the sky. These were com- 
puted for a single ^ 40, 000-sample segment of the CMBl 
scan covering an area of ~ 6 square degrees on the sky, 
corresponding to ~ 400 square 8arcm pixels. Neither fil- 
tering nor marginalization has been applied to the time 
ordered data. The agreement between results from the 
different methods is generally quite good. More quan- 
titatively, ~ 98% of the matrix elements show relative 
differences less than 20% when computed using the ex- 
act minimum variance approach and the MADCAP ap- 
proximation, and ~ 50% show differences less than 5%. 
Similar numbers are found for the comparison of this ex- 
act method and the CAP approximation. Usually, both 
approximate methods tend to overestimate high positive 
correlations. The MADCAP approach also seems to un- 
derestimate the amplitude of negative correlations, yet 
frequently recovers weak correlations more precisely than 
the other approximation. The relative differences of the 
maps are of the order of 10% (Fig. which, on aver- 
age, amounts to no more than 10% of the rms noise level 
predicted for a given pixel (Fig. ||). 
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FIG. 8: Simulated MAXiMA-l-like maps of the CMB anisotropy. Left panel shows the simulated map used for creating the time 
stream data of a single detector of a maxima- like experiment. The time stream was subsequently used to recover the map of 
the sky applying the approximate CAP (middle) and MADCAP algorithms. 



To test if the discrepancies are due to numerical errors, 
rather than differences in the algorithms, we use the ex- 
act method with an explicitly circulant noise correlation 
matrix, and compare the result with that calculated us- 
ing the approximate circulant approach. In the absence 
of numerical inaccuracies both results should be identi- 
cal. We find that the differences are predominantly at 
the 2 — 3% level. We obtain similar error estimates when 
analyzing a diagonal noise correlation matrix using all 
three methods. 

We can also ask if there is any systematic error incurred 
as a consequence of the approximations which bias the 
results of a particular approach. One way to address this 
issue with simulated maps is to check whether the actual 
noise in an estimated map is properly described by a 
concurrently estimated noise correlation matrix. This is 
clearly the case for the exact methods, which are derived 
to obey such a requirement. To test the approximate ap- 
proaches we define nisky as the true noiseless map of the 
sky used for a simulation, rric, Nc, and rrirm Nm are 
maps and pixel-pixel noise correlation matrices as esti- 
mated by the CAP and MADCAP approximate methods 
respectively. For each of the noise matrices we introduce 

a matrix N^^/^ such that TV^^^^ (nT^''^) = N^K 
Consequently the variable y^, defined as 



-1/2 



[mi 



^sky ) 



c, m, 



(21) 



should be a vector of uncorrelated, Gaussian variables 
with a dispersion equal to unity if no systematic prob- 
lem is introduced by an approximation. This can be 
tested using, e.g., the Kolmogorov-Smirnov test (see also 
Sect. VI). Such a test is comfortably passed by both 
methods for noise in the pixel domain ranging from 
~ 100/iK per 8 arcminutc pixel (the maxima-i single 



channel level) down to ~ 60/iK (the maxima-i 4-channel 
combined level). 

The differences between the results of the various 
methods, as summarized above, are found to be rather 
small, and indicate that all the methods may be equiv- 
alent in practice. Although the problem is difhcult to 
tackle in a more quantitative and general way, it can be 
addressed from the point of view of specific statistics de- 
rived from the maps. An example of such a test, compar- 
ing the power spectrum statistic, is shown in Fig. |lO|. In 
this case the power spectra of the maxima-i maps com- 
puted using both map-making methods show very good 
agreement. 

In practise, the approximate methods have an ad- 
vantage over the exact methods of being easily imple- 
mentable. In fact, both can be straightforwardly applied 
using the MADCAP package ||2^ . The differences in the 
results provide some insight into the sensitivity of the 
results to the treatment of time domain noise correla- 
tions. More to the point, the result of any statistical test 
applied to the map, which depends on the choice of the 
approximate method used during map-making, should be 
treated with suspicion. We have demonstrated that for 
the maxima-i data analysis and, in particular, for power 
spectrum estimation, the differences between the meth- 
ods are negligible. 

In summary, all four map-making methods are broadly 
consistent at the noise level of the maxima-i maps. This 
concordance is only expected to improve if longer time 
stream segments are considered. Numerical errors can 
readily be kept under control even with as many as 
O (lO^ — 10^) time samples, at least for MAXiMA-like 
data sets, although further tests may be needed for low 
noise cases. Both approximate methods are easy to im- 
plement, with their speed being limited by the pixel do- 
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FIG. 9: Comparison of two map-making algorithms applied to the real maxima-i data of a single photometer. The left panel 
shows the map computed using the approximate circulant (CAP) approach. The middle panel shows the map recovered using 
MADCAP approach. The difference of both divided by an estimated rms of the pixel noise is shown in a right panel. Note 
that the color stretch limits are from -0.25 to 0.25 in the right panel. 
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FIG. 10: Comparison of power spectra computed for the 
maps made with two different map-making algorithms as 
shown in two left panels of Fig. |^. The data are those of 
a single maxima-i detector. Circles show the results for the 
map made using approximate circulant (CAP) algorithm, and 
diamonds show the results for the map made using the MAD- 
CAP approximation. 

main noise correlation matrix inversion. The exact min- 
imum variance method provides a useful cross-check on 
both the validity of the approximations and numerical 
error propagation, and, with further improvements may 
be able to achieve the computational speed of the ap- 
proximate approaches. 

We note again that all of this assumes a continuous 
time stream vifith stationary noise resulting from appli- 
cation of the gap filling algorithm; the restoration of the 
Toeplitz character of the noise correlations is crucial for 
the feasibility of the exact minimum variance approach, 
as well for as the accuracy of the approximate methods. 



IV. MAP-MAKING: AMENDMENTS 

So far we have been assuming an 'optimistic' model for 
the signal in the time stream, dt (Eq. |l3| ). Commonly, 
unwanted contributions (e.g., the overall average temper- 
ature, primary mirror- or gondola- synchronous noise) are 
present in the data (Eq. [T]) and have to be dealt with if 
a reliable result is to be obtained. 

Let us consider a case of an arbitrary time domain con- 
tribution of known temporal behavior. We take these to 

(i) 

be given by a set of m template vectors , of unknown 
amplitudes, a;*-*-* , so the resulting time stream equation is 

m 

dt = Amp + nt + J2 x^'^'Tt^- (22) 

i=l 

The additional number 2;*^*-' can be formally treated as 
an extra (fictitious) pixel 'observed' with a 'pointing ma- 

trix' as given by . These additional numbers can be 
estimated during the course of a map-making procedure 
and then marginalized over. Alternatively, the entire ex- 
tra term, x'-'V^*'*, can be thought of as a part of the 
noise term and directly marginalized over in the course of 
map-making. Clearly these two approaches are just dif- 
ferent implementations of the same idea, yet, depending 
on the particular problem at hand, one or the other may 
be more appropriate. In the following, we refer to them 
as the extra pixels and marginalization methods, respec- 
tively, and discuss them and some of their applications. 
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A. Extra pixels. 



Let us define a vector x„ as, 



2,(1)^ . . . ^^(™) 



and a matrix B such that, 



~t 7 



(m) 



We can now rewrite Eq. as. 



df = Am„ + Bx„ 



nt. 



(23) 



(24) 



(25) 



From this it is clear that the extra time stream contribu- 
tion can be thought of as a set of extra (fictitious) pixels 
{xq) 'observed' with a pointing matrix given by B. If 
we, furthermore, recast this equation as. 



[A,B] 



nt = Arhp + nt. 



(26) 



we find that it is formally identical to Eq. (|13|). Here, we 
have introduced a generalized pointing matrix, and a 
generalized map, rhp. Those correspond to an 'ordinary' 
map and a pointing matrix, but now extended to incor- 
porate also the extra pixels. The map-making methods 
discussed above can all be applied in the present case, 
with all their caveats and strengths. The appropriate 
equations preserve the form of Eqs. (|lj) & ( [l5| ) but 
the pointing matrix, A, the map, rUp, and the pixel do- 
main noise correlation matrix, TVp, need to be replaced 
by their generalized counterparts, i.e.. A, rhp and Np re- 
spectively. As a result, the map-making procedure pro- 
vides an estimate of a generalized map and a generalized 
noise correlation matrix in pixel domain. 

In addition to the minimum-variance considerations 
described above, the formalism so far described can also 
be thought of as describing the likelihood function for 
the data: the distribution of the underlying (general- 
ized) map is Gaussian with mean and variance as given 
by Eqs. (^6|) & ( p7| ) (or approximations to these). Thus, 
the generalized map is a simultaneous estimate of both 
the real map and the 'extra' pixels. Knowledge of the 
amplitudes of these extra pixels are often useful for diag- 
nostic purposes. However, we are in the end interested 
in the real map itself; hence we wish to marginalize over 
the Xq. With a uniform prior, we find the usual Gaussian 
results: 

• the marginalized map is just the generalized map 
with the extra pixels removed; and 

• the marginalized noise matrix is just the general- 
ized noise matrix with the rows and columns cor- 
responding to the extra pixels removed. 

Clearly, only well-understood features of the time 
stream, for which the pointing operator B is known, can 



be accounted for using this approach. Moreover, if the 
extra component is indistinguishable from the CMB tem- 
perature map, i.e., is sky-stationary, then the clean sepa- 
ration of the map into CMB and non-CMB components 
is impossible. That is manifested as a singularity of the 
A^MA (see Eqs. |lj & |l^) matrix. In some cases, if 
such a singular pixel-domain mode is determined, it can 
be ac count ed for on subsequent stages of the analysis (see 
Sect. [V C| ). In general the discussed framework is quite 
universal and can be applied successfully in a variety of 
circumstances of practical interest. Specific examples of 
its applications are time stream gaps, relative offsets be- 
tween separate segments of the map, and primary mirror 
synchronous effects. All those were found of importance 
for the MAXIMA-I data analysis and we discuss them in 
detail below. 

a Time-stream gaps, ojfsets and temporal frequen- 
cies. Perhaps the most straightforward applications of 
this method are to time stream gaps, time-stream seg- 
ments offsets, and unwanted temporal frequencies. 

The direct application of the extra pixel method to 
the unprocessed raw time stream in order to properly 
marginalize over the unknown content of the gaps would 
require introducing as many extra pixels (parameters Xi 
in Eq. ^) as time samples in the gaps. Not only is this a 
computationally significant extension, but also a source 
of a multitude of possible numerical problems and sin- 
gularities. With the gap-filling procedure as described 
earlier, one extra x parameter (the gap pixel) and a sin- 
gle template, Tt, such as, 



Tt(^) 



1, i E gaps; 
0, otherwise; 



suffices to account for all of the gaps of a single seg- 
ment. That is pre cisely the gap pixel approach briefly 



mentioned in Sect. IIIB 



If a map of more than a single segment is to be pro- 
duced from a single map-making procedure (an approach 
adopted in MADCAP), care must be taken with regard 
to unknown relative offsets between different segments. 
In total power experiments the actual offsets are spuri- 
ous and contain no information about the sky. In the 
parlance of the extra pixel method, the offsets can be 
considered as amplitudes, a;*-^-*, of a set of time-domain 



templates. 



.(I) 



defined for each segment I as 



.(I) 



00 



1, i e I; 
0, otherwise; 



and therefore they can be straightforwardly incorporated 
within the framework of the extra pixel method and the 
map-making formalism ]29|] . 

Similarly unwanted temporal frequencies can be de- 
scribed with one extra parameter, their common ampli- 
tude, if they are first filtered out from the time stream, 
and then replaced by a pure Gaussian noise realization 
with the noise power spectrum as estimated earlier. 
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FIG. 11: The primary mirror synchronous signal recovered from the simulation (left panel) and the real maxima-i data as a 
function of the discretized position of the primary mirror with respect to the gondola. In each of the panels, the recovered 
signal for each of the six segments of the CMBl scan is shown with thin solid line. In the left panel, the thick dashed line 
shows the primary mirror synchronous signal as used for the simulation. The apparent asymmetry of the recovered signal in 
contrast with the symmetry of the function used for simulation is due to a residual CMB dipole contribution shown with a 
thin dot-dashed line. A very similar, slightly asymmetric, shape of the functional dependence is found for the signal recovered 
from the real data as shown in the middle panel. That panel shows the estimated amplitude of the signal synchronous with 
the mirror position, which was added to the time stream data prior to the AC low-pass filtering and therefore also includes the 
sky signal, such as the CMB dipole. The rightmost panel shows the estimated primary mirror synchronous signal added to the 
time stream after AC low pass filtering, and therefore originating in the instrumental electronics. Note the very good stability 
of that contribution with time and its negligible amplitude, as compared to the signal shown in the middle panel, as well as to 
the expected CMB anisotropy. The noise level corresponding to different mirror positions is strongly correlated and for that 
reason not show in the figure. A typical level is ~ 40^tK and ~ IpK for the middle and left panel. 



b Primary mirror synchronous signal. Periodic mo- 
tion of the primary mirror and the gondola can poten- 
tially become a source of an extra parasitic contribution 
to the total signal measured by a MAXiMA-like experi- 
ment. Due to its origin, such a contribution is likely be 
a single-valued function of the position of either the pri- 
mary mirror or the gondola and therefore can be modeled 
using the extra pixel method discussed above. 

Only the primary mirror synchronous effect has been 
found to be important for the actual maxima-i data and 
required this treatment. Below we describe this case in 
some detail as an example. 

Assuming that the primary mirror synchronous contri- 
bution is a slowly varying function of the mirror position, 
we can characterize it using a discrete set of amplitudes 
(i.e., an extra pixel 'map'), each of which describes the 
magnitude of the parasitic signal corresponding to a nar- 
row range of mirror positions (i.e., an extra pixel). The 
extra pointing matrix B assigns then the time samples to 
these discretized primary mirror positions. The presence 
of the instrumental filters introduces an additional com- 
plication. They induce phase shifts in the time stream 
and therefore modify the pointing matrix in a way de- 
pending on the precise location that the synchronous sig- 
nal appeared in the on-board electronics chain (which we 
do not know a priori). In principle, there are four possi- 



ble choices for the correct pointing matrix of the primary 
mirror synchronous signal. However, the effective, total, 
phase shift, caused by the instrumental filters, is strongly 
dominated by the AC low pass filter, leading, therefore, 
to only two significantly different choices for the com- 
bined pointing matrix of this synchronous effect, B. We 
choose those to be, 



B = B, 
B = Fboic 



high 



'B. 



Here we use B to denote the discretized mirror position 
and denote instrumental filters as in Sect. II B. The first 



choice above corresponds to the case in which the syn- 
chronous signal is added prior to AC low pass filtering, 
e.g., a mirror related modulation, and the second one to 
the case in which the extra contribution arises later on. 

A number of extra pixels depends on a particular prob- 
lem at hand. In tlie maxima-i case we use a separate set 
of extra pixels, consisting of 20 to 50 discretized mirror 
positions, for each of the segments. The relative offset 
of the primary mirror signal and the CMB map is ar- 
bitrary, therefore we constrain the value assigned to the 
first extra pixel (corresponding to the leftmost position 
of the mirror) to be zero. That breaks the degeneracy 
between absolute offsets for both components and avoids 
a singularity in the generalized noise correlation matrix 
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in pixel domain. 

For MAXIMA-I we find no other singularity caused by 
the inclusion of extra pixels describing the primary mir- 
ror synchronous signal in the map- making problem. This 
is due to the carefully designed maxima-i scanning strat- 
egy; the typical time scale for variation of the extra con- 
tribution is significantly longer than the time needed for 
the instrument antenna to cross the pixel on the sky, and 
the majority of real sky pixels are revisited many times, 
with the primary mirror position different at each visit. 

By using a single set of extra pixels per segment, we 
make an implicit assumption about the stationarity of 
the underlying synchronous signal on the time scale of 
the length of the segment. Such an assumption needs 
to be tested a posteriori. Recovered primary-mirror- 
synchronous signals are shown in Fig. nll These results 
are for simulations with the synchronous signal explic- 
itly assumed to be strictly stationary, and for the actual 
MAXIMA-I data. In the latter case, the results for dif- 
ferent, but adjacent, segments of the same detector are 
indeed consistent within the error bars, suggesting that 
the assumption of stationarity is satisfied. 

The differences between recovered signals can be the 
result of actual differences of the instrumental signal, 
or of the sky signal, which can have a component syn- 
chronous with the mirror position, or because of instru- 
mental noise. We find that for the maxima-i experiment, 
a part of the dipole signal is subtracted from the sky 
map together with the primary mirror synchronous sig- 
nal. This effect is mainly due to the monotonic gradient- 
like morphology of the dipole within the boundaries of the 

10° X 10° patch observed by MAXIMA-I. For aU mul- 
tipole modes which vary significantly across that patch, 
i.e., for £ 20, such an effect is expected to be unimpor- 
tant. We show effects of the subtraction of the primary 
mirror synchronous signal on a power spectrum estima- 



tion in Fig. 12 



B. Marginalization. 

In the extra pixel method, we first determine the gen- 
eralized map and noise matrix, and then marginalize over 
the unwanted pixels. In the case of the marginaliza- 
tion approach, we first marginalize over the unwanted 
temporal modes, and then pass directly to the marginal- 
ized map and noise matrix. It is based on the idea that 
the undetermined amplitude can be treated as a random 
variable with a prior probability with dispersion cr^ [ p^ . 
Again, we start with the more complicated time stream 
of Eq. (|2^), but for compactness allow only for a single 
template, Tt- Now, we assign the unknown amplitude, 
a;, a prior probability density with variance (x^) — cr^. 
Then, we marginalize over this unknown amplitude right 
away, before making the map. We then recover a distri- 
bution for the time stream in the form of a Gaussian with 
an effective time stream noise matrix given by (hereafter 



denotes a tensor product). 



at; = Nt 



(27) 



That is, we can consider both the Gaussian noise (nt) 
and the template {xTt) together as a noise-like term with 
this correlation matrix. The more complicated equation 
( p^ ) can be then recast in the familiar form of Eq. (p^). 
Solving for the maximum likelihood map gives expres- 
sions in the pixel domain mirroring that of Eq.([l^), with 
the noise correlation matrix in time domain replaced now 
by TVj'. Marginalizing over the amplitude, x, while tak- 
ing the limit ct^ — > cxd, is equivalent to marginalization 
with a uniform prior, as considered above |18| . This is 
tractable because we can simplify the calculation of N^~^ 
using the Sherman-Morrison- Woodbury formula p^ , in 
this context given by 



(Nt-' 



t-'Ttf 



Tt 



Nt 



t) ® {Nt 



Tt^Nt-': 



As expected we have. 



N't 



'Tt = 0. 



(28) 



(29) 



Finally, note that the usual map-making formulas in a 
minimum variance case (Eqs. ^ & 17) require only the 
inverse noise correlation matrix, Nt^ , and remain un- 
changed if Nt^^ is replaced by TV^' ^. Because the cor- 
rection to Nt^^ is additive, we can also think of this as 
a correction to the inverse pixel noise: 



A'^Nt-^Tt) ® [A^Nt-^Tt 



Tt^Nt-'rt 



(30) 



One might suspect that a weakness of the marginal- 
ization method would be the computational overhead in- 
volved in computations of the extra term in Eq. ([2^). 
However, the products of ATj^^ matrix and a template 
vector Tt requires only O (rig log rig) operation if ATt"^ 
is Toeplitz, or O {risXband) if it is (approximately) band- 
diagonal with a band- width, Xtand- Also, recalling that 
what we need in Eq. is A^AT^ ^ A rather than AT^' ^ 
itself, we can cut the number of necessary operations by 
performing the products from outside inwards. That can 



be done in either O {us) or O \ 



pix , 



floating point opera- 



tions, whichever is larger. So the final scaling for any sin- 
gle template is given either by O (n^ log rig) or O (r 
If more than one template is required than Eq. 
needs to be generalized and reads. 



AT;"^ = ATt^i - (Nt'^B) Is'^Nt'^B] {Nt^^B 



(31) 
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FIG. 12: Power spectra of the CMB anisotropies recovered from the simulations (left panel) and the maxima-i single detector 
map (right) . In the left panel all shown spectra are computed assuming the same CMB sky and noise realization and a maxima-i 
like observation. We demonstrate here the performance of the extra pixel method in a case of primary synchronous signal. The 
displayed points correspond to four cases: with a primary synchronous signal absent in the time stream data and either not 
accounted for during map making (triangles) or accounted for (open circles) using the extra pixel method on the map-making 
stage. The remaining two cases, with a primary mirror signal present in the data are shown with open diamonds (no extra 
pixel method applied) and filled diamonds computed for the map produced with the extra pixel method. Right panel shows 
results of tests of the extra pixels method applied directly to the data. Spectra shown with open symbols have been computed 
using different definitions of the extra pixels: diamonds are for the standard case with synchronous signal depending only on 
the position of the antenna, and triangles are for the case when also a dependence on a direction of the primary mirror motion 
(left or right) is allowed. Filled circles depict the spectrum computed for a map with no extra pixel method applied during 
map making. Clearly, only the power in the bin centered at ~ 150 has been affected by the primary mirror synchronous signal. 



where B is as defined in the previous Section (Eq. 24), 
and the expression in the square brackets is a square ma- 
trix of the size given by a total number of time-domain 
templates. We have assumed that the template ampli- 
tudes are uncorrelated, i.e., (^x'^^^x^^^'j — if i ^ j. In 
this case again (cf., Eq. |2g|). 



Nl'^B = 0, 



and, hence, also for each template, Tf^\ (cf., Eq. |2 



0. 



(32) 



(33) 



This approach is fully general and can be used in all 
the cases already mentioned in the context of the extra 
pixel method — both methods give identical results. The 
advantage of the extra pixel method is that it not only 
marginalizes over the unwanted effect, but also computes 
its individual maximum-likelihood estimate. That can 
be useful for a posteriori tests, if some prior assumptions 
have been made to extract the effect. 

Note also that we have constructed a matrix we call 
N^^^ . However, in the limit cr^ oo, itself no longer 
exists: by construction it has an infinite eigenvalue corre- 
sponding to the eigenvector Tt. (There are also patholog- 
ical cases where can be inverted because of numer- 
ical effects.) Nonetheless, the pixel-domain noise matrix 
N' (= {A^ N^~^ A)~^) may still exist due to mixing of 



the modes via the operator A. In fact the resulting 
Av. 



is singular only if Tt 
template. Then from Eq. (t; 



where Vp is a pixel domain 
1) it follows that (cf. Eq. |9|), 



N'^Vr, = 0. 



(34) 



In cases when exists the 'extra pixel' approach will 
work as well. However, the marginalization procedure 
would work even if A'^N'r 

A is singular as we discuss 



that in Sect. IVC 



Below we elaborate on a particular application of the 
marginalization method to the removal of specific fre- 
quency modes from the time stream. 

a Frequency band marginalization. Here we will use 
the method of the explicit marginalization, outlined 
above, to derive a useful concise formula marginalizing 
over unwanted frequency bands. In such a case a required 
set of templates is given in the time domain as. 



(m 



)(j) 



27rio {j + m) 



(35) 



Here, io corresponds to the frequency /o = ^o/?^^:A, which 
is to be marginalized over, j is a time variable, and 
TO = 0, [ns/io] determines the overall phase shift and 
also numbers the template. As usual Us stands for the 
length of the time stream segment under consideration 
and A is the sampling rate. We can now insert these 
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templates into Eq. (|3l| ) to obtain a fully general formula 
for the inverse noise correlation matrix with a frequency 
mode, /o, marginalized over. As explained above, a nu- 
merical implementation of such a strict marginalization 
is feasible within a framework of the minimum variance 
methods. This expression simplifies significantly if Nt~^ 
is a circulant matrix, Nt~^ = Nct~^ ■ Then N^~^ is 

also circulant, A^^ ^ = ^'ct ^' ^'^'^ both are uniquely 
defined by their power spectra (cf., Eq. |9|). Here, we de- 
note these as V' (/) and V (/). From Eq. ( pT| ) we then 
have 

V ii) =V{i)-V (zo) (<5^ [i. in) + S'' {^. -*o)) . (36) 

Here 5^ is the Kronecker delta, and we have made use 
of the fact that in Fourier space the template T(„) is 
represented as, 

f(^)(j) ^ cosf^) [5^(j,zo) + <5^(j- -*o)](37) 



n 



0'" \ \xK 



with L denoting an imaginary unit. From Eq. (pq), we 
have V' (io) — V (— «o) = 0, and therefore, in the case of 
circulant noise matrices, the extra marginalization term 
in Eq. ( pT| ) just zeroes the power of the frequency mode 
which is to be marginalized over. In fact, such an an- 
swer could have been guessed, by noting that the single- 
frequency modes are eigen-modes of the inverse of the 
circulant matrix (as well as the circulant matrix itself) 
and that to introduce a Sherman-Morrison- Woodbury- 
like correction with respect to any of those modes it is 
sufficient to set their corresponding eigen-values to zero 

This observation suggests a simple prescription for fre- 
quency marginalization in the case of the circulant ap- 
proximate map-making algorithm discussed before. Be- 
cause the power spectrum related to the matrix Nct~^ is 
the inverse of the noise power spectrum (cf . , Eq. ^ , it is 
possible to marginalize over the amplitude of a frequency 
mode, /o, by setting the inverse of the noise power spec- 
trum corresponding to that frequency to zero prior to 
calculating Nct^^ via Eq. (|l^). 

The impact of such a procedure on the final map 
is clear from Eq. ([l^): the frequency modes with ze- 
roed power are removed from the map but that is self- 
consistently accounted for in the corresponding noise cor- 
relations matrix. 

Similarly, the MADCAP approach can be modified by 
adopting the following approximation to the inverse noise 
correlation matrix in time domain (cf. , Eq. |l^) , 



TV' " 



if \t-j\<ns/2, (38) 



and zero otherwise. The eigenvectors of ^ are not 
in fact identical with those of ^ and, therefore, the 

former will not usually have the eigenvalues equal pre- 
cisely to zero anymore. The removal of unwanted modes 



from the map in this case is therefore only approximate. 
Again, this is an effect we have found to be negligible in 
practise. 

One may wish to marginalize over frequency bands 
which are compromised by a periodic parasitic sig- 
nal (e.g., synchronous effects). However, in this case 
this method is less discriminating than the extra pixel 
method, making no use of the phase information usually 
available. Marginalization is also useful to minimize the 
significance of sampling uncertainty present at low fre- 
quency and leadi ng to errors in the noise estimation pro- 
cedure (see Sect. H C ). It can also be applied at the high 
frequency end, where the precise shape of the instrumen- 
tal filters is not well known. In fact, in the MAXIMA-I case 
we have applied marginalization to deal with the lowest, 
^ 0.1 Hz, and the highest frequencies, ^ 30 Hz, for the 
reasons just mentioned. We have found no strong de- 
pendence of our results on the specific choice of bounds, 
obtaining nearly identical results if these values are set 
to be 0.2 Hz and 20 Hz respectively. 



C. Singularities and pixel templates. 

Having accounted for these time stream 'templates', 
Tt, we can produce the map and the inverse pixel-domain 
noise matrix. As mentioned above, we may not be able 
to obtain the noise matrix itself, due to zero eigenvalues 
in the inverse, corresponding to 'infinite noise' in some 
modes. 

However, just as the mapmaking procedure per se only 
requires ^ , subsequent manipulations of the map of- 
ten can be cast in terms of matrix inverses. (In an- 
other language, infinite noise corresponds to zero weight.) 
For example, the likelihood function if we consider a 
Gaussian-distributed signal is 



1 



\2TTMr 



,1/2 



exp 



(39) 



where rrip is the pixelized map as before, and Mp is the 
variance corresponding to the uncorrelated sum of the 
CMB signal, Sp{Ci), and the pixel-domain noise cor- 
relation matrix, i.e., Mp = Sp{Ci) + Np. If, due to 
time domain marginalization, Np^^ has zero eigenvalue 
corresponding to a pixel-domain template (cf., Eq. p4[ ), 
{A^ Nt^'^ A)-^ Nt^^Tt, then we compute Np 



Up 

as 



Np = {Np ^ + e (g) Vp'^) ^ - e ^Vp(^ Vp^ , (40) 

where e is a small positive number and the inversion on 
the rhs is to be performed directly. This procedure re- 
places an infinite eigenvalue of the Np (corresponding to 
the eigenmode Vp) with zero, leaving all other eigenval- 
ues unaffected. The total inverse correlation matrix is 



20 



now to be understood as, 



Mp ^ = lim {Sp{Ct) + Np + alvp(^Vp^ 

- T 



(41) 



where Vp = {Sp (Ci) + Np)^^ Vp and the last term is 
a usual Sherman-Morrison- Woodbury term. This ex- 
pression assumes that Sp (Ce) + Np is invertible. How- 
ever, any zero eigenvalue modes, additional to Vp, can be 
treated in the same way as Vp, once a singular mode has 
been determined. 

Even if we have not explicitly accounted for effects that 
may leave us with a singular Np, we can use a similar 
technique, if we know the pixel-domain pattern of the 
responsible modes. In this case, in analogy with the time 
stream case (Eq. p5|), we can write the map as 



avr 



(42) 



the final result can be still correct, while such a com- 
putation may accidentally duplicate the (approximate) 
numerical marginalization technique discussed by Bond, 
Jaffe & Knox ^ and, e.g., implemented in MADCAP. 
However, it is still advisable to first determine the singu- 
lar modes prior to applying such a method. 

We also note in passing that this same formalism can 
be used to 'marginalize over' other sorts of template am- 
plitudes at the map stage, rather than the time stream 
templates considered earlier. An important example of 
this are templates corresponding to known sources of 
foreground emission, such as galactic dust, whose spatial 
morphology is well-known from studies at other wave- 
lengths, and are sub-dominant but potentially important 
contaminants at CMB wavelengths [ p^ . 

For other easily identifiable singularities see the next 
Section. 



D. Combining maps of time-stream segments 



Here, Sp is the signal, rip is the noise, and Vp describes 
the shape of the unknown mode, with unknown ampli- 
tude a, over which we will marginalize. Once again we 
can make an use of Eq. (^Tj) , with Np this time computed 
in the standard way. 

As an example, the total offset of the map is spurious 
and undetermined (and it is often numerically convenient 
to set it to zero). That is, the detectors are only sensitive 
to temperature variations, rather than absolute temper- 
atures. In the case of a lack of correlation between the 
noise and the underlying map (as we have assumed all 
along), the inverse of the noise correlation matrix com- 
puted for such a map should be singular by construc- 
tion. The eigenvector corresponding to the zero eigen- 
value is just a constant function of a pixel number, i.e., 
Vp^ = [1, 1]'^. Knowing that, it is straightforward to 
perform the 'inversion' as in Eq. (^). In this specific 
case such information can be included while computing 
the power spectra using the MADCAP package ||2^. An 
analogous problem has been addressed by the COBE- 
DMR team This equation can be straightforwardly 
generalized for cases with more complicated eigenvectors 
as discussed below. 

Eq. (|4^) simply sets to zero an infinite eigenvalue of an 
inverse matrix. That formal procedure does not 'solve' 
the problem of singularity; rather it gives a compact ex- 
pression for the noise correlation matrix, which together 
with the knowledge of the singular modes provides com- 
plete information needed for statistically sound exploita- 
tion of the map. 

If such a singularity is identified, it usually can be 
dealt with efficiently, producing a statistically valid re- 
sult. Therefore, determination of singular modes present 
in the map has to be a part of a robust map-making 
procedure. Numerical inaccuracies often obscure singu- 
larities, making them difficult to find. The presence of 
such an undiscovered singular mode does not necessar- 
ily invalidate the outcome. In fact, in some applications. 



The map-making formalism as presented so far can be 
applied to a number of statistically independent segments 
simultaneously. However, it may be advantageous to an- 
alyze each of these separately. In this case one needs to 
combine the separate segments together at the end. For 
Gaussian noise this would be quite straightforward, were 
it not for arbitrariness in the offset of each segment. The 
latter introduces possible relative offset shifts between 
the segments. This can be resolved for partially over- 
lapping segments if we require them to display, within 
the noise uncertainty, the same underlying pattern in the 
common region. This introduces an extra complication 
to the well-known formula (e.g., ||3^) for the optimal co- 
addition of two maps. The maximum-likelihood prob- 
lem can be solved in a standard manne r or using an ap- 
proach analogous to that of Sect. IV B . In consequence, 
on defining, for each time-stream segment, 'Up(/) as a 

pixel-domain vector of ones and Mp(7) = -^p(7)~^'Up(7), 
and introducing a corrected inverse noise correlation ma- 



trix, N;^j^- 



such as (cf., Eqs. 



El)' 



^P(i) -^P(n up^rfup^i) 



(43) 



we can express a final full map and a corresponding noise 
correlation matrix in a familiar manner. 



Here the sum is over maps of all segments to be combined. 
As discussed above the undetermined absolute offsets of 
each of the segments separately is reflected in the sin- 
gularity of each of the redefined inverse noise correlation 
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FIG. 13: Pixel domain correlations of the noise projected on the sky. All three panels show a level of the correlations relative to 
the RMS value of the noise for the same pixel, which is marked with an 'x'. Color coding is shown in left panel. This pixel has 
been observed twice during the maxima-i flight. The noise correlations for the first observation are shown in the left panel, and 
these for the second one in the middle panel. Right panel shows the final co-added noise correlations (see Eq. Clearly, due 
to the MAXIMA-I scanning strategy and the presence of the noise correlations in the time domain, the noise correlation pattern 
in pixel domain is highly anisotropic and strongly correlated as a result of any single observation of a pixel. The combined 
noise, for all pixels which were observed twice, is however significantly less correlated and more isotropic. 



matrices, ■^p(/) ■ The final inversion in Eq. again 
has to be understood as in Eq. @ and the singular- 
ity needs to be accounted for in subsequent stages of the 
data analysis, as, for instance, shown in Eq. (pT|). 

A MAXIMA-I based example of an application of this 
procedure is shown in Fig. 1^. As in the case of a sin- 
gle continuous time stream segment |^ , re-observing the 
same patch of the sky along the different scanning direc- 
tion not only suppresses the level of the noise per pixel 
but also weakens and isotropizes the correlation pattern 
in pixel domain. 

Note that the expression in the curly brackets of 



Eq. (44) becomes singular also, if not every segment map 
is connected (directly or through the number of inter- 
mediaries) with the others. If no such link exists the 
relative offset of such a segment map with respect to the 
rest of the map remains unknown giving rise to a singu- 
lar mode in addition to the one related to the absolute 
offset. However, Eqs. (Q & (^) can still be applied 
if the inversion of the singular matrix is interpreted as 
in Eq. (40). The singular eigenvector Vp has to now be 



appropriately replaced. 

For instance, in a case of a single disjoint segment the 
two singular modes emerging as a result of the unknown 
total offset of the full map and a relative offset of the 
disjoint part can be chosen as having ones for every pixel 
belonging to one disjoint part and zeros elsewhere (or vice 
verse). With the inversion now described by the straight- 
forward generalization of Eq. (^^ for the case of multiple 
singular eigenvectors, the final product of the operation 
given by Eq. ( ^sj ) can be a map composed of many discon- 
nected regions with the uncertainty due to our ignorance 



of their relative offsets incorporated into the total noise 
correlation matrix, N^°* , as given by Eq. (^4|). 

Numerically one may encounter nearly singular cases 
whenever the overlapping region between two segments 
is too limited or the noise per pixel in the overlapping 
area too high to provide any useful constraint on the free 
offset. A practical and safe way of dealing with such a 
problem is to reject a (small by assumption) number of 
common pixels to make a given part genuinely discon- 
nected and to account in a mathematically strict manner 
for the arising singularity of the noise correlation matrix. 

A power spectrum of such an unconnected ma p can 
be subsequently computed as explained in Sect. IV C . 
However, that requires more involved algebra than that 
currently implemented in the MADCAP version of the 
quadratic estimator. For that reason we have rejected 
all the disconnected segments from the MAXIMA-I maps 
while computing their power spectra. In that way the 
number of segments used for the final analysis decreased 
to 14 per detector [|[ || . 



E. Combining maps of different photometers. 

The additional difhculty here in comparison with the 
previous Section is introduced by the possible relative cal- 
ibration difference between maps produced for different 
detectors. Calibration uncertainty is generically difficult 
to be included into a maximum likelihood framework due 
to its multiplicative character. For the maxim A-i data set 
we have found that the relative calibration between differ- 
ent photometers' maps are with a high precision correct 
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if the mean dipole based calibration is adopted for each 
of the maps . Therefore rather than seeking a general 
solution to the problem, we combined maps as given by 
the Eq. ( p5| ) and used the largest single detector error for 
the calibration uncertainty of the combined map. 



F. IjOW-£ aliasing. 

A potential bias of the final anisotropy power spectrum 
(but also for other statistics, see, e.g., (2^) resulting from 
this kind of the data analysis is due to an incomplete (and 
modest in a case of all balloon-borne experiments) sky 
coverage. That induces the correlation between otherwise 
uncorrelated (for a statistically isotropic sky) £ modes. 
As a consequence, power contained in the low-^ modes 
beyond the detection capability can leak to the higher £ 
modes which are to be estimated. Because the amplitude 
of the anisotropy power spectrum usually decays with 
increasing £ - and is many orders of magnitude higher in 
monopole and dipole than in any other mode - there is 
a potential for biasing of the \ow-i end of the estimated 
power spectrum. 

One possible approach is to consider the unwanted 
modes as pixel-space templates as discussed above 
(Sect. IIVCI): Vp 



(ip) = Yim{ip) for all the 



T.) modes 

to be marginalized over, {ip is here a pixel number.) 
We then explicitly marginalize over them, while estimat- 
ing, e.g., the anisotropy power spectrum (Eq. pl| ). This is 
closely analogous to the time stream frequency marginal- 
ization of Sect. 



IV B 



Another, approximate, way to deal with the problem 
also can be applied directly on the power spectrum es- 
timation stage is implemented in the MADCAP pack- 
age [|9, 39 as described at the end of this Section. 

An alternative exact solution is based on Gorski's 
idea |Q . In this approach the unwanted low-i* modes are 
removed from the map prior to further statistical analysis 
and the corresponding noise correlation matrix in pixel 
domain is appropriately corrected to account for the ad- 
ditional uncertainty. Unlike the other just-mentioned op- 
tions this method produces a 'cleaned' version of the sky 
map to be used henceforth. 

Let us start by defining a scalar product of two func- 
tions / and g defined at each pixel ip of our map rrip as 
(hereafter ★ stands for a complex conjugate), 



(/|g)^— (»p)9(v). 



(46) 



Also we denote by {y} a subset of the {Iq -f 1) spherical 
harmonics of the order not higher than which are to be 
removed form the map. In general {y} is neither a lin- 
early independent nor a complete basis on the map nip. 
However, we can construct a set of orthonormal func- 
tions {i/j} spanning the space of the spherical harmon- 
ics included in {y}. The construction can be performed 
by using Singular Value or Cholesky decomposition of 



the Kowalewski-Gram determinant of {y} functions (as, 
e.g., ^^) and rejecting all the singular modes. The re- 
sulting set of functions though orthonormal is clearly not 
complete. To achieve completeness we can supplement it 
with the functions from the another orthonormal (and 
complete) set of functions defined on the map and re- 
tain from the latter - through a standard Gram-Schmidt 
orthonormalization procedure ~ only those functions (or 
their linear combination) which are orthogonal to all the 
{■0} functions. Practically, that part of the procedure 
can be encoded using Singular Value Decomposition. A 
convenient choice of the extra complete functional basis 
is just a pixel basis, {p\, 



P(i) Up) = S (j, jp) , where i, jp^l, 



(47) 



On the successful completion of the entire procedure we 
end up with the set of the Upix orthonormal functions. 
By construction, all the {ip} functions are included in the 
final basis, and they span all the spherical harmonics of 
the order < on the map rUp. Hence all the remaining 
functions of the final basis (denoted hereafter {^}), which 
are orthogonal to the functions by construction, are 
also orthogonal to the all spherical harmonics of that 
order. 

Algebraically the described procedure consists of a pair 
of linear transformations, which in a pixel representation 
can be written down as, 



Ky, 
Lp 



(48) 
(49) 



Here K changes the basis from that of spherical harmon- 
ics, {y}, to {ip}. L transforms the pixel basis, {p}, to 
the orthogonal basis made of basis vectors either parallel 
or perpendicular to {ip} and retains only the latter sub- 
set, {^}. These functions complement {ip} and both sets 
together form an orthonormal and complete basis {xp, ^} 
on the map, such as 



(y|0 = o. 



(50) 



The map purged of all the contribution from the low 
(< ^o) order spherical harmonics is then given as. 



(51) 



and is therefore uniquely represented by a vector de- 
fined as, 



(i) = (mp||(i)) . 



(52) 



The total (signal plus noise) correlation matrix for is 
then given as. 



(m^ §5 m^'^ 



L{Sp + Np)L^ 



(53) 



where (...) denotes an ensemble average. Sp and Np 
are signal and noise correlation matrices computed for 
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the complete map, rUp, prior to any mode removal, i.e., 
Sp + Np = {irip (g) rUp^). If the sky signal contains only 
CMB anisotropy then the signal term, Sp, is given by. 



Sp^^y^P,{2l+l)Ct, 
47r — ' 



(54) 



where Ci is the power spectrum of the CMB fluctuations, 
and Pi denote matrices of the Legendre polynomials com- 
puted for all the pairs of the pixels in the given map, i.e.. 



Pt {ip, ip) = Pe [cos (7 (ip) ■ 7 (jp))] , 



(55) 



Here 7 (ip) is a unit vector pointing at the center of the 
pixel ip and Pe a standard Legendre polynomial of an 
order £. 

The correlation matrix of the map without low-^ 
modes, as defined in Eq. (|5^), can also be rewritten as a 
sum of a noise-likc, N^, and signal-like, S^, term, 



L [Sp + Np] = S^ + 



(56) 



Furthermore, on introducing the redefined Legendre 
polynomials matrices P^ = LP^L^ and using Eq. (54), 
we can rewrite the signal part of Eq. (|5^ ) as. 



S^ = ^^iP^i^(2£+l)C, 



(57) 



-Y.P^{2£ 



47r 



The last equation shows that the signal correlation ma- 
trix of the 'cleaned' map is related to the angular power 
spectrum of the CMB anisotropies, Ci, in a way which 
is formally identical to that for the complete map, cf., 
Eq. (54). Therefore we can use usual quadratic estima- 
tor algorithms and, in particular, the MADCAP pack- 
age |9| to estimate the CMB power spectrum having 
at disposition only the map (and the corresponding noise 
correlation matrix, Eq. |5^) with low-£ modes removed. 
To use that package in this case one just needs to replace 
matrices Pi (which constitute an intermediate output of 
MADCAP) with matrices P^ and use the cleaned map 
in the ^ representation, m^, and the noise correlation 
matrix, N^, as an input, instead of usual rrip and Np. 

Though the matrices K and L can be chosen to be 
sparse (nearly triangular) the entire procedure is quite 
involved and the computational cost scales as O (rip^^). 
The same scaling applies to the computations of prod- 
ucts of L and Pi matrices. However, while only the 
low £ part of the angular power spectrum is likely to 
be affected by that kind of aliasing, the results of such 
a test applied to a map with relatively large pixels can 
provide a useful estimate of the size of the expected ef- 
fect. In the MAXIMA-I case we applied this test to the 
map with 10' pixels (i.e., less than 4000 pixels in total) 
and compared it with the simple approximate template 
procedure of as implemented in the MADCAP pack- 
age of weighing out the monopole and dipole from the 



map, and introducing an extra low-^ bin to the recovered 
anisotropy power spectrum which is a posteriori rejected. 
In the MAXIMA-I case this bin extended from ^ = 2 up to 
£ — 34. Its rejection corresponds to the marginalization 
over that bin power and would be formally strict only 
if the likelihood for the power spectrum bin amplitudes 
were precisely Gaussian. Though such an approach is less 
general, because of these approximations and because it 
explicitly assumes the isotropy of the sky signal in the 
rejected modes, we found no difference between the re- 
sults of both methods. Therefore, we conclude that the 
MAXIMA-I final anisotropy power spectrum, in the pub- 
lished to date range of multipoles, 35 ^ £ ^ 1235, is not 
affected by any appreciable contribution due to aliasing 
of the low-^ mode power. 



V. ITERATIVE NOISE ESTIMATION. 

To date maxima is one of the most sensitive experi- 
ments in terms of the noise level per measurement achiev- 
ing the level of ^ 1500/zK, i.e., lOO/xKy^, for some of 
the detectors. In spite of that the expected CMB and 
foreground signal for the observed patch of sky is still 
expected to be a sub-dominant part of a single measure- 
ment (^ 10% of the total power). The sky-related con- 
tribution to the power confined within some of the fre- 
quency bands can be, however, much higher ( ^ 10%). 
Similarly, the non-stationary effects, though they may 
appear to be quite small, they can be limited to a num- 
ber of narrow frequency bands, dominating the power 
in there. For instance, in the maxima-i case the pri- 
mary mirror synchronous signal with its amplitude of 
200/iK dominates occasionally the power in the narrow 
frequency bands centered on the fundamental mode of 
primary mirror chop and its few lowest harmonics (see 
Fig.|l|). 

The important assumption behind th e no ise power 

that 



spectrum estimation as presented in Sect. II C 



I.e., 



noise dominates the time stream measurements, though 
not clearly breached needs to be, therefore, tested. 

To account for that effect we follow the iterative ap- 
proach of It attempts to recover the noise compo- 
nent of the entire time stream, which is subseq uent ly used 
in the noise estimation procedure (see Sect. II C). The 
starting point of the iterative procedure is an approxima- 
tion that nt^^) ~ dt- Given this, it proceeds to the noise 
estimation and then to the map making. The resulting 
(zeroth order) map is used as an estimate of the signal in 
the time stream and is subtracted from the time stream 
on the next iterative step ||ll|, . If we denote the noise 
contribution to the time stream on the ith step as n-t^'-* 
then, 



(58) 



where m*^' ^\x^'^ and rfip*-' are a map and a pri- 
mary mirror signal and a generalized map respectively as 
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Estimated noise power spectra for simulated and real data using the iterative approach of Ferreira & Jaffe |llj . In 
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[IC without iterative corrections. 



FIG. 14 

left panel, dashed line shows a noise power spectrum computed as in Sect 
depicts the noise spectrum used for a simulation, that almost perfectly overlaps with a solid line corresponding to a noise power 
spectrum calculated using 4 iterative steps and a map with 8' pixels. The dash-dotted line shows a result after 4 iteration 
but using a map with 3' pixels. That demonstrates the bias as described in the text. Middle panel shows noise power spectra 
for a single segment of the actual maxima-i data. The power spectrum has been recovered using various approaches as in the 
left panel: the solid line shows the spectrum computed with 4 iterative steps and 8' pixels, dashed line shows the uniterated 
estimate, and dash-dotted line shows a result of 4 iterative steps with 3' pixels. The right panel shows noise spectra obta ined 
just as a result of averagingon the second step of the noise estimation procedure, prior to any smoothing (see Sect. [IC and 
the right most panel of Fig. pi). Dashed line corresponds to a spectrum obtained with no iteration, displaying characteristic two 
small spikes at ~ 0.45Hz and 0.9Hz corresponding to a primary mirror chop frequency and its first harmonic. The overplotted 
solid line is a spectrum after 4 iteration using 8' pixels. Clearly in the latter spectrum the spikes are corrected as a result of 
iterations. 



estimated on the previous step. As shown by |Tl] the 
differences between the maps and noise correlation matri- 
ces estimated on the subsequent steps of the iteration de- 
crease very quickly and the required precision is achieved 
rather rapidly. Usually we have found that at most four 
iterative steps were needed to reach the accuracy of few 
percent. The iterative approach can be significantly sped 
up if no explicit inversion of the pixel-pixel noise corre- 
lation matrix is performed on each iterative step but the 
map is calculated using an iterative linear system solver 
p. 

It is important to notice that the method is only 
asymptotically, - i.e., in the limit of the large number 
of effective degrees of freedom - unbiased as guaranteed 
by its maximum likelihood origin. That limit is achieved, 
for instance, when a number of time samples increases, 
but a number of pixels is fixed. 

The ensuing bias can be estimated as follows. Let us 
assume that we know the noise correlation function, Nt- 
Though the noise estimation goal seems to have been 
achieved, the iteration, as given by Eq. (|5^), can go on. 
The time stream estimate on the next step will contain 
noise only but composed of two components: a true time 
stream noise rit — rxt^^'^ and a pixel domain noise, rip, 
projected back to the time domain via the pointing ma- 



trix (and therefore also non-stationary): 

n,t(''+^^ = nt^'' - Aup = nt- Aup. (59) 

The correlations of the latter quantity are computable 
and given by 

i^t)nt^'+'^ (jt)) = Nt {lujt) (60) 

- ^ A{it,ip)Up {ip,jp)A'^ Ut,jp) ■ 
*p -Jp 

Here the summation is over all pairs of pixels and Mp 
stands for the noise correlation matrix in the (general- 
ized) pixel domain corresponding to the true noise corre- 
lation in time domain Nt- (...) denotes an average over 
the statistical ensemble of the noise realizations. 

Interestingly, due to existing correlations between 
noise in the time and pixel domain 'adding' the extra 
noise to the time stream as described by Eq. ( ^9[ ) results 
in the underestimation of the actual noise power in the 
time domain. Though the above formula does not really 
help to unbias the procedure it gives a useful criterion of 
its applicability. For practical use it is useful to consider 
the overall power suppression due to the bias. Taking the 
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trace of Eq. (pOj) we get, 



^(n,(''+i)(z,)n,(^+i)(z,)) - (61) 

it 

^Nt{it,it) - '^nt[ip)Np{ip,ip) . 



Two limiting cases are evident. If each of the pixels is 
observed once then the noise iteration has no meaning, 
as we have no means to distinguish between the noise 
and the sky signal, and its result is fully biased, i.e., the 
left hand side of the above equation is zero. If there is 
no correlation in the time domain then pixel noise is just 
proportional to the number of times a given pixel has 
been observed. Inserting that into Eq. (|6l|) renders, 



nt 



(i+i) 



, {ris 



(62) 



where is a diagonal element of Nt- This formula, 
which in the white noise case can be also derived directly 
from the maximum likelihood considerations, quantifies 
the introduced fractional bias as equal to npix/ris- If the 
number of pixels is fixed then in the limit of the increasing 
number of measurement the bias disappears as expected. 

If the correlations in the time streams are not negligi- 
ble then a similar expression can be concocted with the 
number of pixels replaced by an 'effective' number of the 
parameters n^f f , which is to be determined case by case. 



(63) 



The intuitive meaning of n^f / is clear from the expression 



(64) 



In the MAXIMA-I case Uef / is not smaller than rip^a; , and 
hence correlations tend to increase the bias of the noise 
iteration procedure. 

It is important to notice that due to our assumption of 
stationarity, the bias depends only on how on average the 
measurements are divided between pixels and is therefore 
robust to the presence of the poorly sampled pixels. 

From Eq. ( |6l| ) it is clear that the bias is a result of 
the noise presence in the pixel domain. It is therefore 
advantageous to use all available information to minimize 
the noise of the estimated map, including all the time 
stream data during which a given patch of the sky has 
been observed. 

In the MAXIMA-I case we have found (Fig. |lj) that 
it is still advantageous to perform noise iterations for 
pixels as small as 8 arcminutes. For smaller pixels, the 
resulting map is occasionally too noisy, and the likely 
bias larger than an expected gain. Therefore, in those 



cases, we either restrict iterative corrections to deal only 
with the primary mirror synchronous signal or use noise 
power spectrum estimates obtained for the 8 arcminute 
pixels Q. The latter approach is helpful in correcting 
for the low- angular-scale power, which should constitute 
the bulk of the sky signal present in the time stream. 
However, one may worry that it also introduces spurious 
correlations at the time lags corresponding to the charac- 
teristic crossing time of the big pixel. In practise, we have 
found that the results rendered by both these approaches 
are in very good agreement. 

Clearly, invoking some kind of a map-denoising method 
and/or applying on this stage more aggressive filtering 
can be useful to extend the applicability of iterative noise 
estimation. We leave this issue for future research. 



VI. CONSISTENCY TESTS. 

A number of assumptions and approximations are in- 
volved in the computations of a map and a corresponding 
noise correlation matrix, so it is desirable to test the con- 
sistency of the final map-making products. Clearly that 
is difficult for the maps containing still-to-be-determined 
cosmological contributions without any prior assump- 
tions. However, it can be done for the maps containing 
just noise. Those can be either the projections on the 
sky of the so called 'dark' bolometers usually incorpo- 
rated in experiments to track instrumental effects in the 
data 0, or just differences of the maps computed for 
single detectors. If, as expected, the maps recovered for 
a single photometer contain only the sky signal and the 
noise, the subtraction removes the sky component leaving 
a map of the noise only. Under the assumption of Gaus- 
sian time stream noise, the noise maps are also Gaussian, 
and their correlations are given by the noise correlation 
matrices produced in parallel by a map-making proce- 
dure (Sect. III). Any failure to meet such a requirement 



would suggest either a failure of the basic assumptions 
or some other problem with the data, the data analysis 
methods, or both. 

The noise maps, due to correlations and inhomogeneity 
of the noise, are described by the multi-dimensional prob- 
ability distributions with a number of dimensions equal 
to a number of pixels Upi^ of a map under consideration. 
Therefore, notwithstanding the large number of pixels, 
a simple histogram of the pixel temperatures is likely 
to display significant deviations from the 1-dimensional 
Gaussian distribution even for the truely Gaussian case. 
To alleviate this problem we first 'prewhiten' the map 
performing a linear transformation in order to decorre- 
late the measurements in the different pixels of the map. 



Wr 



N„ 



(65) 



here Np ' is a 'square root' of the noise correlation 
matrix as estimated from the map-making, satisfying a 
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FIG. 15: Probability distribution functions of the pixel amplitudes for the noise maps computed as a difference of two 
maps made for two different photometers. The upper row shows the results using prewhitened maps. Their corresponding 
unprewhitened versions are displayed in the lower row. Histograms show the results for the real data and smooth lines the 
Gaussian curves with a dispersion equal to one (upper panels) or fitted to best match the actual results. The two right panels 
show contour plots of the histograms of joint two-dimensional probability distributions of the pixel temperature for two 'noise' 
maps. These were computed as a difference of the sky maps recovered for two independent pairs of the photometers. The 
lower panel shows the results for the actual maps with the correlated pixel noise, and upper panel for the maps prewhitened 
prior to histogramming. Concentric ellipses marked with thin solid lines show the Gaussian expectation. Those were computed 
assuming no correlations between both maps and were discretized the same way as data histograms and setting the dispersions 
either to unity (upper panel) or to the best fit values (lower panel). 



relation, 



Np-^ EE ATp-i/a iVp-i/2 



(66) 



1/2 

In the following we take Np ' to be a Cholesky tri- 
angular matrix (e.g., [p6|). If the noise correlations 
are estimated correctly (or at least consistently), then 
the components of the vector Wp (a 'prewhitened map') 
are uncorrelated and their variances are equal to unity. 
Thus the multi-dimensional problem reduces to the well- 
defined one-dimensional test. Moreover, if the noise in 
the pixel domain is Gaussian then each of the compo- 
nents of Wp is randomly drawn from the Gaussian 1- 
dimensional distribution with the unit variance. That 
is the hypothesis which we test. We apply a one di- 
mensional Kolmogorov-Smirnov test. Its results give an 
estimate of how often the one point distribution func- 
tion such as the one actually measured can be obtained 
from the Gaussian distribution with a unit variance. For 
MAXIMA-I we have four single-detector maps, giving us 



six independent difference maps. We test all of these 
noise maps, finding that the KS significance is always 
higher than ~ 10% and usually as high as ~ 50 — 60%, 
confirming the very good consistency of our map-making 
products. The sample of the results is also shown in 
Fig. 15. Clearly, a histogram of the unprewhitened map 
shows a significant deviation from the Gaussian curve 
near the peak (lower, left panel of the figure) and in the 
tails of the distribution. As anticipated, prewhitening 
largely resolves the discrepancies, allowing us to recover 
nearly perfectly Gaussian curves. 

Though such KS-like tests usually provide a weak diag- 
nostic, they are powerful consistency tests when passed. 
Sources of possible failure are abundant. Those can be 
either problems of the data set like an extra photometer- 
dependent parasitic signal in the time-stream leaving its 
imprint in the difference maps, or non-Gaussianity of the 
time domain noise, or cross-correlation between maps 
used for the creation of the difference maps. Moreover, 
the fact that one of our maps was actually made of the 
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data obtained by the photometer centered at a higher fre- 
quency (~ 240 GHz) than the others frequency 150 
GHz) also suggests the lack of a substantial frequency- 
dependent sky (e.g., foreground) signal, as expected from 
the choice of the low contamination contrast patch ob- 
served during the maxim A-i flight. Alternatively, the 
source of the problem may lay with the data analysis, 
such as noise misestimation both in the time domain 
and pixel domain as a result of the involved approxi- 
mations. In fact we have found that the results of KS 
tests depend on the precise map-making procedure we 
choose to apply to the real data. In particular, we derive 
somewhat different numbers if, e.g., no noise iteration 
has been performed, or no primary-mirror-synchronous 
signal has been removed. In both cases the differences 
are mainly due to the differences in the long wavelength 
modes present in the maps, which are the most suscepti- 
ble to the details of the map-making algorithm. Never- 
theless, that shows that the KS test possesses sensitivity 
which makes it a useful tool in tracking realistic problems 
in the maps and/or procedure. The KS test can be also 
applied to the maps with the low--^ multipoles removed 
as described in the previous Section. Hence the lack of 
an indication of the problem with the KS test results is 
quite encouraging and may serve as a fairly comprehen- 
sive validation of the final results. 

In addition, for every pair of the noise maps we can 
consider a two-dimensional probability distribution of 
two noise maps calculated as differences of two actual 
single detector maps. Here we use a difference map of 
the first and second detector, mp(i2), and of the third 
and fourth, mp(34) and consider the probability distri- 
bution V (mp(i2),mp(34)). 

If we assume that the signal detected by different de- 
tectors is uncorrelated, the correlation matrix for a pair 
(mp(i2), mp(34)) has a block diagonal structure with 
blocks given by the correlation matrices of each of the 
difference map separately. Prehwitening in such a case is 
simply given by. 



1^p(12) 




A^p(12) 






0, 



(67) 

The two dimensional probability distribution 
^ {'Wp(i2), Wp(^s4)) is then bound to have a unit variance 
and the Gaussian shape unless the cross-correlation term 
is indeed present or any of the previously mentioned 
reasons/ assumptions is not fulfilled. The obtained 
results (right hand panels of Fig. |l^) seem to agree 
well with the expectations and therefore confirming the 
consistency of our analysis. 

VII. SUMMARY. 

Recent CMB data sets have set new challenges for the 
data analysis. Problems are related to a sheer size of new 



data sets and also to a quality of the analysis tools, which 
can make a full use of increasing power of data. 

This paper describes how those challenges were met in 
the analysis of the maxima-i data set. The successful 
production of the final results, as published in the recent 
papers ^ ^, required us to improve on the existing and 
develop and test new methods and tools. Though some of 
them had to be significantly customized to be efficient, 
the others seem to be of a more general character and 
applicability extending to data sets as big as that of the 
forthcoming satellite missions ||, |^ . 

We have focused here on time ordered data manipu- 
lation techniques and map-making algorithms. We have 
presented a comprehensive, consistent approach allowing 
us to recover a map of the sky and to estimate its er- 
ror matrix in realistic circumstances of an actual CMB 
experiment. The highlights include: 

• the time stream noise estimation procedure cou- 
pled together with the gap filling method through 
constrained noise realization; 

• an exact version of the map making code; 

• the statistically sound methods of dealing with the 
time stream filtering of contaminated frequencies 
and time-domain templates. 

The entire suite of the methods presented here amounts 
to a self-consistent approach to building a final map out 
of smaller parts in an efficient way. The construction 
can be halted when the largest map lending itself to the 
statistical analysis (e.g., power spectrum estimation) has 
been built. Dealing with only subsets of an entire data 
set seems to be the only efficient way of searching for, 
understanding and removing any systematic problems. 

CPU time-wise, the presented methods are limited by 
the noise correlation matrix inversion, which requires 
O (rip^^) operation. This is the price to be paid if no 
assumption is made about possible symmetries present 
in the map and a pixel-pixel noise correlation matrix is 
required. This obstacle may not be insurmountable, even 
without sacrificing the generality of the approach. In 
most of the realistic situations such a matrix is expected 
to be rather sparse. Moreover, from the knowledge of a 
scanning strategy and characteristic correlation length in 
the time domain, it can be guessed a priori which matrix 
elements need to be computed. We leave both issues for 
further investigation, noting only here that extensive use 
of a supercomputer and the MADCAP package facilitates 
computations of the maps containing up to ^ 40, 000 [||. 
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We have omitted the extra (prewhitening) filter ma- 
trix, D, introduced in ^ to reduce the number 
of floating point operations in the matrix multipli- 
cations of Eq. (p^. In fact the band-diagonality 
of 'prewhitened' matrices (i.e., M' = DMD^ 
and N^^^ = DNt~^ D^) can be exploited only 
if the products in Eq. ( p^ ) are performed from 
outside inwards, and though that may speed up 
computation of M' the computational cost of 
the subsequent products, i.e., [A^ M')N'^^'^ and 
[A^ M')N'^^^^ {A^ M'Y , offsets any advantages on 
the earlier stage. 



